Rates of approximation of nonsmooth integral-type functionals of Markov processes

We provide strong $L_p$-rates of approximation of nonsmooth integral-type functionals of Markov processes by integral sums. Our approach is, in a sense, process insensitive and is based on a modification of some well-developed estimates from the theory of continuous additive functionals of Markov processes.


Introduction
Let X t , t ≥ 0, be an R d -valued Markov process. We study an integral functional of this process. The most natural numerical scheme to approximate such a functional is the sequence of integral sums h(X (kT )/n ), n ≥ 1, and the main objective of this paper is to study approximation rates within this scheme. The function h, in general, is not assumed to be smooth, and therefore the mapping may fail to be Lipschitz continuous (and even simply continuous) on a natural functional space of the trajectories of X (e.g., C(0, T ) or D(0, T )). This makes it impossible to carry out the error analysis with a classical technique (see, e.g., [5]). The typical case of interest here is h = 1 A , with I T (h) being respectively the occupation time of X at the set A up to the time moment T .
In the paper, we establish strong L p -approximation rates, that is, the bounds for Our research is strongly motivated by the recent paper [7], where such a problem was studied in a particularly important case where X is a one-dimensional diffusion, and we refer the reader to [7] for more motivation and background on the subject. The technique developed in [7], involving both the Malliavin calculus tools and the Gaussian bounds for the transition probability density, relies substantially on the structure of the process, and hence it seems not easy to extend this approach to other classes of processes, for example, multidimensional diffusions or solutions to Lévy driven SDEs.
We would like to explain in this note that, in order to get the required approximation rates, one can modify some well-developed estimates from the theory of continuous additive functionals of Markov processes. An advantage of such approach is that the assumptions on the process are formulated only in terms of its transition probability density and therefore are quite flexible. The basis for the approach is given by the fact that the weak approximation rates for are available as a consequence of a bound for the derivative w.r.t. t of the transition probability density; see [4], Theorem 2.5, and Proposition 2.1 below. To explain the principal idea of the approach, let us assume for a while that h is nonnegative and bounded. Then the integral functional I T (h) is a W -functional of the process X; see [1], Chapter 6. It is well known that the properties of a W -functional are mainly controlled by its characteristic, that is, the expectation In particular, the convergence of characteristics implies the L 2 -convergence of the respective functionals. The core of our approach is that we extend the Dynkin's technique for a study of convergence of W -functionals and give approximation rates for integral functionals I T (h) by difference functionals I T,n (h), based on the weak approximation rates for their expectations. We remark that now we are beyond the scopes of the original Dynkin's theory because I T (h) may fail to be a W -functional (we do not assume h to be nonnegative), and I T,n (h) definitely fails to be a Wfunctional. In addition, Dynkin's theory addresses L 2 -bounds, whereas, in general, we are interested in L p -bounds. This brings some extra difficulties, which however are not really substantial, and we resolve them in a way similar to the one used in the classical Khas'minskii lemma; see, for example, Lemma 2.1 in [8].

Notation, assumptions, and auxiliaries
In what follows, P x denotes the law of the Markov process with X 0 = x, and E x denotes the expectation w.r.t. this law. The natural filtration of the process X is denoted by {F t , t ≥ 0}. The process X is assumed to possess a transition probability density, denoted below by p t (x, y). By C we denote a generic constant; the value of C may vary from place to place. Both the absolute value of a real number and the Euclidean norm in R d are denoted by | · |.
Our standing assumption on the process X under investigation is the following.
The assumption X is motivated by the following class of processes of particular interest.
Example 2.1. Let X be a symmetric α-stable process with α ∈ (0, 2]; in the case α = 2 this is just a Brownian motion. Then with g (α) being the distribution density of X 1 . Respectively, (2) holds with Q = Q α , Observe that, in a sense, this bound is "stable under perturbations of the process X." Namely, if X is a uniformly elliptic diffusion with Hölder continuous coefficients, then (2) with Q = Q 2 and properly chosen c 2 is provided by the classical parametrix method; see [3]. An analogue of the parametrix method for α-stable generators with state-dependent coefficients yields the bound (2) with Q = Q α ′ , α ′ < α, for α-stable driven processes X; see [2] and [6].
Our principal assumption on the function h is the following.
• V is submultiplicative, that is, Observe that for a bounded h, condition H1 holds trivially with V ≡ 1. On the other hand, in particular cases, one can weaken the assumptions on h by using nontrivial "weight functions" V . For instance, if Q = Q α from the above example, then one can take with arbitrary C and β ∈ (0, α). We denote The following auxiliary statement is crucial for the whole approach. Its proof is completely analogous to the proof of (a part of) Theorem 2.5 [4], but in order to make the exposition self-sufficient, we give it here.

Proof.
Write We have, by the bound for p t (x, y) in (1) and properties of V , Next, n k=2 kT /n and therefore, using the bound for ∂ t p t (x, y) in (2), we obtain, similarly to (3), which completes the proof.

Approximation rate in terms of h V
Our main estimate, in a shortest and most transparent form, is presented in the following theorem, which concerns the case where the only assumption on h is that the weighted sup-norm h V is finite.
we have Hence, this difference is an absolutely continuous function of t, and using the Newton-Leibnitz formula twice, we get We then write where Let us estimate separately the expectations ofH 1 T,n,p (h) and H 2 T,n,p (h). By the Hölder inequality,

Again by the Hölder inequality,
Because t ∈ [s, ζ n (s)], we have η n (t) = η n (s), and, consequently, By the properties of V and the bound (1) we have for any r ∈ (0, T ]. Using this bound with r = t, s, η n (s) and recalling that |ζ n (s) − s| ≤ 1/n, we get with constant C depending on T, Q, V, p only. Next, observe that, for every s, the variables By Proposition 2.1 and the Markov property of X we have Hence, again, using the Hölder inequality we get Similarly to (4), we have Hence, the above bounds for E xH 1 T,n,p (h) and E x H 2 T,n,p (h) finally yield with a constant C depending on T, Q, V, p only. It can be seen easily that in this inequality one can write arbitrary t ≤ T instead of T , with the same constant C.
Taking the integral over t, we get Because h V < ∞ and V p satisfies the integrability condition from the condition of the theorem, the left-hand side expression in the last inequality is finite. Hence, resolving this inequality, we get which, together with (5), gives the required statement.

An improved approximation rate for a Hölder continuous h
In this section, we consider the case where h has the following additional regularity property.
H2. The function is Hölder continuous with index γ ∈ (0, 1], that is, An additional regularity of h allows one to improve the accuracy of the previous estimates. Namely, the following statement holds.

Theorem 2.2.
Assume that X, H1, and H2 hold. Then, for every p ≥ 2 such that we have with constant C depending on T, Q, V, p, γ only.
Proof. The method of the proof remains the same as that of Theorem 2.1; hence, we use the same notation. The only new point is that, instead of the bound now a more precise inequality is available, based on the Hölder continuity of h. Namely, we have where C depends on T, Q, p, γ only. The last inequality holds due to the following representation. By the Markov property of X, for r < s, we have Thus, for t ∈ [s, ζ n (s)], we have E x ∆ n (s) p/2 ∆ n (t)
In Theorem 2.3 of [7], h satisfies our conditions H1 and H2 with V (|x|) = e C|x| , and X is a one-dimensional diffusion with coefficients that satisfy some smoothness condition (assumption (H)); in particular, X holds with α = 2. In this case, our bound C(log n) p n −p/2−(pγ)/4 , given in Theorem 2.2, is somewhat worse. On the other hand, this bound is of an independent interest because of a wider class of processes X it applies to.