Note on local mixing techniques for stochastic differential equations

This paper discusses several techniques which may be used for applying the coupling method to solutions of stochastic differential equations (SDEs). They all work in dimension $d\ge 1$, although, in $d=1$ the most natural way is to use intersections of trajectories, which requires nothing but strong Markov property and non-degeneracy of the diffusion coefficient. In dimensions $d>1$ it is possible to use embedded Markov chains either by considering discrete times $n=0,1,\ldots$, or by arranging special stopping time sequences and to use local Markov -- Dobrushin's (MD) condition. Further applications may be based on one or another version of the MD condition. For studies of convergence and mixing rates the (Markov) process must be strong Markov and recurrent; however, recurrence is a separate issue which is not discussed in this paper.


Introduction
The stochastic differential equation (SDE) in R d is considered. Here (W t ), t ≥ 0 is a d-dimensional Wiener process, b and σ are vector and matrix-valued Borel measurable functions of dimensions d and d × d respectively.
It is assumed that the equation (1) has a (weak or strong) solution which is a strong Markov process; see [6]. Naturally, under this condition the process X n -that is, our solution X t considered at integer times t = 0, 1, . . . -is a Markov chain (MC), which is, of course, also strong Markov. The advantage of the total variation distance (although, it is not unique in this respect) for Markov processes is that once it is established that, for example, where µ t is the marginal distribution of X t , µ is any probability measure (ergodic limit for (µ n )), then this rate of convergence can be nearly verbatim transported to the continuous time: where [t] is the integer part of t. Consider two independent versions of our Markov process X t (in continuous time), say, X 2 t and X 2 t , with two different initial values x 1 and x 2 , respectively (or distributions). Since we allow weak solutions, the processes X 2 t and X 2 t , generally speaking, are defined on two different probability spaces with two different Wiener processes; without loss of generality, we may assume that they are independent and consider their direct product; thus, we have two trajectories X 2 and X 2 on the same probability space (do not forget that Wiener processes are also different and independent of each other). Denote by Q(x, dx ′ ) the transition kernel, Q(x, dx ′ ) = P x (X 1 ∈ dx ′ ).

Definition 1 A (global) Markov -Dobrushin's (MD) condition holds for the Markov process
Here it is not assumed that necessarily Q(x 1 , dx ′ ) is absolutely continuous with respect to Q(x 2 , dx ′ ), but its absolutely continuous component and its derivative is taken under the integral. In what follows a localised version of this condition will be stated and this localised version will be the object of our main interest in this paper. General approaches to coupling for SDEs require a (usually positive) recurrence and some form of local mixing. For the latter, beside intersections applicable only in the case d = 1, the following tools may be used.
• Lower and upper transition density bounds (requires Hölder coefficients and "elliptic" or "hypoelliptic" non-degeneracy); here the popular in discrete time theory of ergodic Markov chains petite sets condition along with recurrence properties may be used.
• Lower and upper density bounds of the transition density only for the equation without the drift, including degenerate and highly degenerate cases: here Girsanov's transformation is an efficient tool; petite sets conditions, generally speaking, do not work.
• In the absence of lower and upper density bounds, under the non-degeneracy condition and for general measurable coefficients of SDE Harnack inequalities in parabolic or elliptic versions may be applied; a petite sets condition apparently may be proved; however, they are less efficienet than MD because the latter guarantees better convergence rate esimates.
Hence, the goal of this paper is to attract more attention to the MD condition, which this condition deserves in the humble opinion of the author. There is also a hope that this list of available techniques may help in the future in studying ergodic properties for more general classes of processes. One more point is that except for the method based on lower and upper density bounds, in all other more involved situations the popular DD condition [3], or, in its local version, the petite set condition is difficult to apply to SDEs, unlike the MD one; and even if it may be applied, the MD condition provides weaker assumptions and better convergence rate estimates.
Note that for discrete time stochastic models, and in certain cases for continuous time, too, one more natural approach to coupling is to use regeneration. Unfortunately, for general SDEs this method is not available. So, we do not discuss it here, although, the multidimensional coupling constructions for processes with continuous distributions are sometimes called "a generalised regeneration".
Let us warn the reader that most of the results of this paper are known, perhaps, in a slightly different form; we just collect them here together. For simplicity we do not touch more general equations such as SDEs with jumps.
However, in principle, more general Markov processes, in particular, SDEs with jumps, may also be tackled with the help of similar techniques.
It should be also highlighted that all methods discussed in what follows (but the elliptic Harnack inequality) can be applied with minor differences to nonhomogeneous SDEs, too, except that convergence would be for the distance in total variation between marginal distributions corresponding to any two initial measures, not to the invariant measure which does not exist in this situation, in general.
The paper consists of two sections: this introduction and the main section 2; in turn section 2 is split into six sub-sections, most of them related to one of the coupling tools listed above. The majority of proofs are sketchy or dropped because the results are known; the only exception is the part about elliptic Harnack, which is new to the best of the author's knowledge and where the details of the proof are shown. The paper presents the set of various tools for coupling for SDEs. Neither recurrence -the necessary second ingredient in studying convergence and mixing rates -nor coupling itself (except for the basic lemma 1 added for the reader's convenience) are not the goals of this paper. If in the 1D case for local coupling we can use intersections of two independent solutions of the same SDE with different initial values. Assume that X t and X ′ t are two solutions of the equation (1) with different initial values X 0 = x and X ′ 0 = x ′ in the one-dimensional case. The basis for applying coupling via intersections is the following result.
The first meeting time τ := (t ≥ 0 : X t = X ′ t ) is a stopping time.
Proof follows from the following two elementary steps. 1. Change time for both SDEs making diffusion coefficients equal to one. There is no need to make it the same random time change: generally speaking, the latter is not possible unless the diffusion coefficient is a constant. Since σ and σ −1 are bounded, the interval [0, 1] after this change becomes random, but for both processes contains some non-random interval [0, T ] : This can be also applied to non-homogeneous SDEs with coefficients depending on time.
2. The random time change leaves the drift bounded. Hence, due to Girsanov's transformation of measure, the probability that the process with a lower initial value will attain the level +1 over [0, T ] is positive and bounded away from zero. Similarly, the probability that the process with a bigger initial value will attain the level −1 over [0, T ] is positive and bounded away from zero. Therefore, they meet on [0, T ] with a positive probability which is bounded away from zero, as required. QED

MD, "case b" & "petite set" conditions
In dimensions d > 1 intersections do not work for the "normal" SDEs, and we now switch to the main topic of this paper, local mixing conditions. Global and local versions of "petite set" and Markov-Dobrushin's (MD) conditions will be stated. Most frequently either of them is applied in its local variant, but the global options also work in cases of a uniform ergodicity. It should be noted that, in fact, local versions may vary slightly depending on a particular setting; we only show their main appearances. The "petite set" condition is a localised version of the "case (b)" condition from [3, Chapter V, section 5], which is a simplification of the "condition D" (nowadays called Doeblin -Doob's one) from the same chapters in [3]. Let us highlight that the MD condition may also be in a global or local form.

Definition 2
The process satisfies the condition "b" (from [3, Chapter V]) iff there exists a probability measure ν on the state space X and constants T, c > 0 such that Definition 3 The process satisfies the local condition 'b', or "petite set" condition iff there exists a set D ⊂ X , a probability measure ν on D and constants T, c > 0 such that See, in particular, [9] about the usage of the petite set condition in convergence studies. Recall that normally this local condition -as well as the local MD condition in the definition 5 in the next paragraphs -should be accomplished with certain recurrence assumptions or properties; however, as it was said earlier, recurrence is not the goal of this paper, and it makes sense to study it separately. Global conditions "case b" and MD both lead to efficient uniform in the initial data exponential convergence.

Definition 4
The following is called the global Markov-Dobrushin condition: there exists T > 0 such that Definition 5 The following is called a local Markov-Dobrushin condition: there exist sets D, D ′ ⊂ X in the state space and a constant T > 0 such that Remark 1 Usually, but not necessarily D ′ = D; in this case we use the notation κ(D, D ′ ; T ) =: κ(D; T ). Another possibility is D ′ = R d . A sufficient condition for (7) is as follows: there exists a dominating measure ν(dy) such that µ x T (dy) ≪ ν(dy) for any x ∈ D, and In general, there might be no dominating measure for all x simultaneously. Yet, as we shall see, (8) may be verified in most of the cases in what follows.
Clearly, the "petite set" condition implies the MD one, both in the global ("case b") and local versions, for example, (5) implies (6): but, generally speaking, not vice versa. The basis for applying coupling via any of them is the following coupling lemma (not to be confused with the coupling inequality). Let us add that the MD condition admits some further generalisation, see [15,16], which provides in certain cases a slightly better efficient convergence bound under slightly wider assumptions. However, this note is just about tools which allow to check a local condition MD for non-degenerate SDEs. The following lemma clarifies why MD condition is so useful; at the same time it serves as the basis for a further application of the MD condition to coupling technique for Markov processes.
Lemma 1 ("Of two random variables") Let X 1 and X 2 be two random variables on their (without loss of generality different, which will be made independent after we take their direct product) probability spaces (Ω 1 , F 1 , P 1 ) and (Ω 2 , F 2 , P 2 ) and with densities p 1 and p 2 with respect to some reference measure Λ, correspondingly. Then, if then there exists one more probability space (Ω, F , P) and two random variables on itX 1 ,X 2 such that This is a well-known technical tool in the coupling method. The proof -which is simple enough -may be found, for example, in [13]. This reference should not be regarded as a claim that this lemma belongs to the author, although, who whe first inventor of this lemma is not clear to him. The next lemma justifies the hint that to estimate the convergence rate for a Markov process to its invariant measure (assume that it exists) in continuous time (X t , t ≥ 0) it suffices to evaluate it for discrete times n = 0, 1, . . . Its elementary proof is provided for the reader's convenience. Let µ X t be the marginal distribution of X t , and let µ X be its invariant measure.

Lemma 2 It holds,
Proof. Due to the Markov's property of X, by Chapman -Kolmogorov's equation using the convention a + = a ∨ 0, a − = (−a) ∨ 0, we get as required. QED

MD using lower and upper transition density bounds
Assume d > 1 and Gaussian type upper and lower bounds for the transition densities, which can be established under the non-degeneracy of σσ * and Hölder coefficients [4,5,12], as well as under certain hypoellipticity conditions (see, for example, [2,8,10], et al.). In particular, let σσ * be uniformly non-degenerate, and let both σ and b satisfy Hölder's conditions: there exists L, α > 0 such that for any As it follows from the PDE theory (see the references above), under such conditions for any t > 0 there exist constants C t , C ′ t , c t , c ′ t > 0 such that Gaussian type lower and upper bounds hold true for the transition densities f t (x, x ′ ) (fundamental solutions in the PDE language) In particular, under the non-degeneracy condition on σσ * , it may be used C t = Ct −d/2 , C ′ t = C ′ t −d/2 , c t = ct, c ′ t = c ′ t with some C, C ′ , c, c ′ , and under the hypoelliptic conditions it is also known how to evaluate all these constants. Then, clearly, a local "petite set" condition is satisfied with any bounded domain D (an open set by definition) and with the Lebesgue measure as ν. Hence, the MD condition is also valid. To the best of the author's knowledge this is the only case -although, this class of coefficients is wide enough, but far from the most general -where the "petite set" condition can be applied to Markov SDEs in order to arrange coupling.

MD using stochastic exponentials
In this section let us assume that lower and upper bounds for transition densities hold true for the SDE with a "truncated drift" while the goal is to arrange local coupling for the full SDE with the more involved drift of the form where b 2 is just Borel measurable and bounded (this boundedness may be relaxed). We are interested in establishing an MD condition for the full equation (1). Note that upper and lower bounds from the previous subsection, in general, are not applicable. Denoteb 2 (x) := σ −1 (x)b 2 (x), and let Recall that ρ T is a probability density for any T > 0. Denote by µ t the marginal distribution of X t .
Theorem 1 (local MD condition via Girsanov) For any T > 0 and R > 0 This inequality suffices for applications to coupling and convergence rates (given suitable recurrence estimates). For the proof of very close statements (actually, even for degenerate SDEs) see [1,14]. Some other localised versions of this result may be established: as an example, the sets B R under the infimum sign and as a domain of integration may, actually, differ.

MD using parabolic Harnack inequalities
As usual in this paper, in this section we assume that d ≥ 1, coefficients b and σ are bounded (which can be relaxed by a localisation) and Borel measurable, and σσ * is uniformly non-degenerate. Under such conditions Krylov -Safonov's Harnack parabolic inequality holds true [7, Theorem 1.1], stated here in terms of probabilities rather than solutions of PDEs: where Γ ǫ is the parabolic boundary of the cylinder ((t, the constant N depends on d, on the ellipticity constants of the diffusion, on the sup-norm of the drift, and on ǫ. Note that in (10) the measure in the numerator is absolutely continuous with respect to the one in the denominator, that is, there is no singular component in this situation. Let where dγ is the element of the boundary Γ ǫ . Then the following local mixing bound holds true.
Theorem 2 (local MD via parabolic Harnack) Let µ x 1 (Γ ǫ ) ≥ q with some q > 0. Then a version of Markov-Dobrushin's condition holds, Note that the value q here may be chosen arbitrarily close to one, if ǫ > 0 is small enough. However, the decrease of ǫ implies the increase of the constant N in (10).
Proof. Indeed, due to the inequality (10) we have, Denote byμ ǫ,x 2 the absolutely continuous part of µ ǫ,x 2 with respect to µ x 1 (we do not know whether there exists a singular component here, but the calculus in what follows does not depend on this). Then Hence, the assumption µ Sometimes it may be more convenient to use another version of the MD condition, which follows from theorem 2. Denote Corollary 1 Under the assumptions of theorem 2 the following version of the MD condition holds: there exists q ′ ∈ (0, q) such that Note that here R d plays the role of D ′ in the MD condition. In some cases this may not be convenient; however, using moment bounds of the solution a reasonable version of this inequality with some bounded ball B R in place of R d is, of course, possible. We leave it till further studies where such a replacement may be required.
Proof. Note that due to the boundedness of σ and b, Denote We have, Hence, denoting ν ǫ,x 2 (dz) := P x 2 (X ǫ ∈ dz), we find This was the first step in the reduction of the MD characteristics in the left hand side of (13) to the Harnack inequality: now we may deal with the measures µ ǫ,z 1 (dy) and µ x 1 1 (dy). However, these are still not the ones which show up in (10) or in (11). The next step will complete this reduction. Let Λ x 1 ,z (dy) := µ ǫ,z 1 (dy) + µ x 1 1 (dy).

MD using elliptic Harnack inequalities
The assumptions of this sections are the same as in the previous one: d ≥ 1, coefficients b and σ are bounded (which can be relaxed) and Borel measurable, and σσ * is uniformly non-degenerate. We have the elliptic Harnack inequality due to [11,Theorem 3.1], stated here in its probabilistic form (while in [11] it is offered in the language of elliptic PDEs): there exists a constant N > 0 such that for any 0 < R ≤ 1 and any A ∈ ∂B R , where τ R = inf(t ≥ 0 : |X t | ≥ R), and ∂B R is the boundary of the ball B R . This inequality itself is some MD condition. In fact, it is not clear whether this version of Harnack inequality may be helpful for estimating convergence rate of the distribution of X t to its stationary regime. Nevertheless, if it can be used for such a purpose -which is the author's hope -then it might be more convenient to apply the following version of the MD condition based on the inequality (16).
Note that T >0 Denote by Γ R,T the part of the parabolic boundary of Q R,T corresponding to t < T , namely, Γ R,T = ((t, x) : 0 ≤ t ≤ T, |x| = R).

Theorem 3 (local MD via elliptic Harnack) The local MD condition
holds true for any T > 0 large enough.
In principle, it is possible to evaluate such values of T for which (20) holds, and it might be useful for estimating convergence rates, but, as it was said earlier, we do not pursue this goal here. As already mentioned, unlike with the parabolic Harnack inequality, it is not clear how useful the local mixing property in the form (20) of (16) could be for coupling; this may be clarified in further studies.
Proof. Clearly, τ R,T ≤ τ R . Note that due to the non-degeneracy of σ we have τ R < ∞ a.s., and Equivalently, Hence, as T ↑ ∞, where the convergence is uniform with respect to A and |x| ≤ R/8. The inequality (16) implies the following, As a consequence, for any R > 0 we get inf By virtue of the monotone convergence theorem and due to (18) we have for any Hence, for T large enough we obtain from (19), However, technically this is still not sufficient because we want a similar inequality with infimum inf x 1 ,x 2 ∈B R . Using the elementary inequality (a we have, Here ( sup On the other hand, inf Therefore, there exists T 0 such that inf which completes the proof. QED