The asymptotic error of chaos expansion approximations for stochastic differential equations

In this paper we present a numerical scheme for stochastic differential equations based upon the Wiener chaos expansion. The approximation of a square integrable stochastic differential equation is obtained by cutting off the infinite chaos expansion in chaos order and in number of basis elements. We derive an explicit upper bound for the $L^2$ approximation error associated with our method. The proofs are based upon an application of Malliavin calculus.


Introduction
We consider a one-dimensional continuous stochastic process (X t ) t∈[0,T ] that satisfies the stochastic differential equation where (W t ) t∈[0,T ] is a Brownian motion defined on a filtered probability space (Ω, F , (F t ) t∈[0,T ] , P). Various numerical approximation schemes for the SDE (1.1) have been proposed and studied in the literature in the past decades. The probably most prominent numerical approximation for the solution of (1.1) is the Euler scheme, which can be described as follows. Let ϕ n : R + → R + be the function defined by ϕ n (t) = i/n when t ∈ [i/n, (i + 1)/n). The continuous Euler approximation scheme is described by dX n t = b ϕ n (t), X n ϕn(t) dt + σ ϕ n (t), X n ϕn(t) dW t with X n 0 = x 0 . (1. 2) The probabilistic properties of the Euler approximation scheme have been investigated in numerous papers. We refer to the classical works [2,3,14,16,17,23] for the studies on weak and strong approximation errors among many others. Asymptotic results in the framework of non-regular coefficients can be found in e.g. [1,8,9,27]. In this paper we take a different route and propose to use the Wiener chaos expansion (also called polynomial chaos in the literature) to approximate the solution of the SDE (1.1). To explain ideas let us fix an orthonormal basis (e i ) i≥1 of the separable Hilbert space L 2 ([0, T ]). It is a well-known statement (cf. [6]) that if X t ∈ L 2 (Ω, F , P) for all t ∈ [0, T ], where F is generated by the Brownian motion (W t ) t∈[0,T ] , it admits the chaotic expansion where x α are deterministic functions, Ψ α are mutually orthogonal random projections that are associated to the basis (e i ) i≥1 , and the index set I is defined via I := α = (α i ) i≥1 : α i ∈ N 0 and almost all α i 's are 0 .
(1.4) Such orthogonal expansions have been successfully applied in numerous fields of stochastic and numerical analysis. We refer e.g. to [11][12][13]25] for applications of the Wiener chaos expansion in the context of SDEs and to [20][21][22]26] for applications of polynomial expansion in modelling, simulation and filtering of stochastic partial differential equations. The aim of this work is to use the chaos expansion (1.3) to numerically approximate the solution of the SDE (1.1). For this purpose we need to truncate the infinite sum in (1.3). An obvious approach is to consider the approximation where the subset I p,k ⊂ I refers to using orthogonal projections Ψ α only with respect to the first k basis elements (e i ) 1≤i≤k and only up to the pth order Wiener chaos. This method is mostly related to the articles [11,12,22,21]. More specifically, the works [21,22] study the L 2 -error associated with the approximation (1.5), but only for a particular choice of the basis (e i ) i≥1 . In this paper we will study the decay rate of E[(X t − X p,k t ) 2 ] when k, p → ∞ for a general basis (e i ) i≥1 of L 2 ([0, T ]) applying methods from Malliavin calculus.
The paper is structured as follows. In Section 2 we present the elements of Malliavin calculus. The main results of the paper are demonstrated in Section 3. Section 4 is devoted to proofs. In Section 5 we illustrate our approach with exemplary numerical results for the Haar and a trigonometric basis, and propose a heuristic based on sparse indices for computational speedup.

Background on Malliavin calculus
In this section we introduce some basic concepts of Malliavin calculus. The interested readers are referred to [24] for more thorough exposition on this subject. Set H = L 2 ([0, T ]) and let ·, · H denote the scalar product on H. We note that H is a separable Hilbert space and denote by (e i ) i≥1 an orthonormal basis of H. We consider the isonormal Gaussian family W = {W (h) : h ∈ H} indexed by H defined on a probability space (Ω, F , P), i.e. the random variables W (h) are centered Gaussian with a covariance structure determined via Here and throughout the paper we assume that F = σ(W ). In our setting we consider W (h) = T 0 h s dW s , where W is a standard Brownian motion. We define the normalised Hermite polynomials through the identities The nth Wiener chaos H n is the closed linear subspace of L 2 (Ω, F , P) generated by the family of random variables {H n (W (h)) : h H = 1}. The vector spaces H n , n ≥ 0, are orthogonal and we have the Wiener chaos expansion where (e i ) i≥1 is a fixed orthonormal basis of H.
We deduce that the set {Ψ α : α ∈ I with |α| = n} forms a complete orthonormal basis of the nth Wiener chaos H n and consequently {Ψ α : α ∈ I} is a complete orthonormal basis of L 2 (Ω, F , P) (cf. [ Furthermore, for any symmetric h ∈ H ⊗n , g ∈ H ⊗m , the following multiplication formula holds: where the rth contraction g ⊗ r h is defined by 2) transfers to the context of multiple integrals as follows. For each random variable F ∈ L 2 (Ω, F , P) we obtain the orthogonal decomposition where I 0 = E[F ] and the decomposition is unique when g n , n ≥ 1, are assumed to be symmetric (cf. [24, Theorem 1.1.2]). Next, we introduce the notion of Malliavin derivative and its adjoint operator. We define the set of smooth random variables via where f ∈ C ∞ p (R n ) (i.e. the space of infinitely differentiable functions such that all derivatives exhibit polynomial growth). The kth order Malliavin derivative of F ∈ S, denoted by D k F , is defined by Notice that D k F is a H ⊗k -valued random variable, and we write D k x F for the realisation of the function D k F at the point x ∈ [0, T ] k . The space D k,q denotes the completion of the set S with respect to the norm We define D k,∞ = ∩ q>1 D k,q . The Malliavin derivative of the random variable Ψ α ∈ S, α ∈ I, can be easily computed using the definition (2.8) and the formula H ′ n (x) = √ nH n−1 (x): Higher order Malliavin derivatives of Ψ α are computed similarly. The operator D k possesses an unbounded adjoint denoted by δ k , which is often referred to as multiple Skorokhod integral. The following integration by parts formula holds (see [24,Exercise 1.3.7]): if u ∈ Dom(δ k ) and F ∈ D k,2 , where Dom(δ k ) consists of all elements u ∈ L 2 (Ω; H ⊗k ) such that the inequality |E[ D k F, u H ⊗k ]| ≤ c(E[F 2 ]) 1/2 holds for some c > 0 and all F ∈ D k,2 , then we have the identity (2.11) In case the random variable F ∈ L 2 (Ω, F , P) has a chaos decomposition as displayed in (2.7), the statement F ∈ D k,2 is equivalent to the condition ∞ n=1 n k n! g n 2 H ⊗n < ∞. In particular, when F ∈ D 1,2 we deduce an explicit chaos representation of the where g n (·, t) : [0, T ] n−1 → R is obtained from the function g n by setting the last argument equal to t (see [24,Proposition 1.2.7 ]). Finally, we present an explicit formula for the Malliavin derivative of a solution of a stochastic differential equation. Assume that (X t ) t∈[0,T ] is a solution of a stochastic differential equation (1.1) and b, σ ∈ C 1 (R). In this setting DX t = (D s X t ) s∈[0,T ] is given as the solution of the SDE . Throughout the paper ∂/∂x denotes the derivative with respect to the space variable and ∂/∂t denotes the derivative with respect to the time variable.

Main results
We start with the analysis of the Wiener chaos expansion introduced in (1.3). Under square integrability assumption on the solution X of the SDE (1.1) we obtain the Wiener chaos expansion In order to study the strong approximation error we require a good control of the coefficients x α (t). When X t is sufficiently smooth in the Malliavin sense, we deduce the identity applying the duality formula (2.11). In fact, the coefficients x α (t) satisfy a system of ordinary differential equations. The following propagator system has been derived in [12,21]. We state the proof for completeness.
. Then X t possesses the chaos expansion (1.3) and the coefficients x α (t) satisfy the system of ordinary differential equations Proof. Using the SDE (1.1) and applying the formula (3.1) we obtain the identity Applying the formula (3.1) once again for the random variable b(s, X s ) we immedi- On the other hand, observing the identity δ(1 [0,t] σ(·, X · )) = t 0 σ(s, X s )dW s , we get by the duality formula (2.11) and (2.9) that Putting things together we obtain the identity Consequently, the assertion follows after taking the derivative with respect to t.
We remark that the propagator system (3.2) is recursive. Let us give some simple examples to illustrate how (3.2) can be solved explicitly.
In this case we obviously have that x α (t) = 0 for any α ∈ I with |α| ≥ 2. Applying formula (3.2) we obtain the representation which is a well known Karhunen-Loéve expansion of the scaled Brownian motion.
(ii) (Geometric Brownian motion) Let us consider the geometric Brownian motion defined via the SDE In this setting the propagator system (3.2) translates to This system of ordinary differential equations can be solved recursively. For α = 0 we have x ′ 0 (t) = bx 0 (t) and hence x 0 (t) = x 0 exp(bx). If α is the jth canonical unit vector in I (and hence |α| = 1) we obtain the differential equation Hence, x α (t) = x 0 σ exp(bx) t 0 e j (s)ds. Following this recursion we obtain the general formula t 0 e j (s)ds αj for any α ∈ I with |α| = p.
For a general specification of the drift coefficient b and diffusion coefficient σ in model (1.1) the propagator system (3.2) cannot be solved explicitly. Thus, precise infinite dimensional Wiener chaos expansion (1.3) is out of reach. For simulation purposes it is an obvious idea to consider a finite subset of I in the expansion (1.3). We introduce the index set The approximation of X t is now defined via (1.5): We remark that the quantity X p,k t is more useful than the Euler approximation X n t introduced in (1.2) if we are interested in the approximation of the first two moments of X t . Indeed, the first two moments of X p,k t are given explicitly by while higher order moments can be computed via an application of the multiplication formula (2.6). The strong approximation error associated with the truncation (3.4) has been previously studied in [21,22] in a slightly different context. In particular, in both papers the authors consider one specific basis (e i ) i≥1 of L 2 ([0, T ]) whereas we are interested in the asymptotic analysis for a general basis (e i ) i≥1 . While [22] mostly uses methods from analysis, our approach is based upon Malliavin calculus and is close in spirit to [21]. The main result of our paper gives an upper bound on the T 0 e l (s)dW s , we readily deduce that Hence, the upper bound on the right-hand side of (3.6) indeed converges to 0 when p, k → ∞. We also note that the error associated with the truncation of the chaos order does not depend on the particular choice of the basis (e i ) i≥1 , while the error associated with truncation of basis strongly depends on (e i ) i≥1 (which is not really surprising). Furthermore, we remark that it is extremely computationally costly to compute all coefficients x α (t), α ∈ I p,k , for a large chaos order p. Thanks to the first bound ((p + 1)!) −1 in (3.6) it is sufficient to use small values of p in practical situations (usually p ≤ 4). In this setting we obtain that for any j ≥ 1. Consequently, we deduce that for some C > 0.
In this case we deduce the following representation for E j,n (t): else.
Since max t∈[0,1] E 2 j,n (t) = 2 −(n+1) and E j,n (t) = 0 only for t ∈ [2 −n+1 (j − 1), 2 −n+1 j], we finally obtain that Note that the exponential asymptotic rate for a fixed pth order Wiener chaos is equivalent to O(k −1 ), if we choose k = 2 n basis elements (e i ). We will come back to these two bases in Section 5 where we provide exemplary numerical results that highlight this identical asymptotic rate, but also the different error distributions over time.

Proofs
Throughout the proofs C = C(t, K) denotes a generic positive constant, which might change from line to line. We start with a proposition that gives an upper bound for the L 2 -norm of the Malliavin derivatives of X t . Proof. We show the assertion of Proposition 4.1 by induction over n. For n = 0 the result is a well-known consequence of the Lipschitz and linear growth conditions (3.5) on the functions b and σ (see e.g. [15, Theorem 2.9, page 289] which relates the second moment of X t to the second moment of the initial condition X 0 = x 0 ). For n = 1 we known from the formula (2.13) that the process (D s X t ) s∈[0,T ] satisfies the SDE for s ≤ t, and D s X t = 0 for s > t. Next, we introduce the two-dimensional stochastic process (Y where b(u, y) = ∂ ∂x b(u, y 1 )y 2 and σ(u, y) = ∂ ∂x σ(u, y 1 )y 2 for y = (y 1 , y 2 ). Applying again the estimates of [15, Theorem 2.9, page 289] to the diffusion process (Y (1) s;t ) t≥s , we conclude the estimate Now, we will perform the induction step. Notice that D s X t ∈ D 1,∞ , so its Malliavin derivative satisfies again an SDE according to [24, Theorem 2.2.2]. We define the stochastic process Y (n) s1,...,sn;t := X t , D s1 X t , D 2 s1,s2 X t , . . . , D n s1,...,sn X t . Assume that the assertion of Theorem 3.3 holds for all k = 1, . . . , n − 1. In order to compute the estimate, we have to consider the initial values of the SDE system satisfied by Y where the sums run over the set P m of all partitions J 1 ∪ · · · ∪ J ν of {1, . . . , m}, where J 1 , . . . , J ν are disjoint sets. We determine z(t) = σ(t, X t ) as well. With these notations at hand, we find by (2.13) and induction that the nth order Malliavin derivative D n s1,...,sn X t satisfies the linear SDE forŝ := max{s 1 , . . . , s n } ≤ t and D n s1,...,sn X t = 0 else. Hence, its initial value is given by where z(s 1 , . . . , s n ) = ∂ n ∂x n σ(s 1 , X s1 ) × D s2 X s1 · · · D sn X s1 + D 2 s2,s3 X s1 · D s4 X s1 · · · D sn X s1 + · · · + D n−1 s2,...,sn X s1 .
Finally, we apply [15, Theorem 2.9, page 289] as for the case n = 1 and taking into account that the assertion holds for k = 1, . . . , n − 1: This completes the proof of Proposition 4.1. , (4.6) where d(α) := max{i ≥ 1 : α i > 0}. To determine an upper bound for A 1 (p), we use the chaos expansions (1.3) and (2.7) of X t , i.e., with t n = (t 1 , . . . , t n ) and symmetric kernel functions ξ n (t n ; t) being defined by = E I n ξ n t n ; t 2 = n! ξ n t n ; t , ξ n t n ; t H = (n!) 2 (n);t ξ n t n ; t 2 dt n , (4.7) where we abbreviate Therefore, we deduce via (4.7) that (n);t E D n t1,...,tn X t 2 dt n . (4.8) Finally, using (4.8) and applying Proposition 4.1 we obtain The treatment of the term A 2 (p, k) is more involved. Recalling the definition of d(α) = max{i ≥ 1 : α i > 0}, we introduce the characteristic set (i 1 , . . . , i n ) of α ∈ I for i 1 ≤ i 2 ≤ · · · ≤ i n and |α| = n. It is defined as follows: i 1 is the index of the first non-zero component of α. If α i1 = 1 then i 2 is the index of the second non-zero component of α; otherwise i 1 = i 2 = · · · = i αi 1 and i αi 1 +1 is the second non-zero component of α. The same operation is repeated for the index i αi 1 +1 and further non-zero components of α. In this fashion, the characteristic set is constructed, resulting in the observation that d(α) = i n . To give a simple example, consider the multiindex α = (2, 0, 1, 4, 0, 0, . . .) ∈ I. Here the characteristic set of α is given by  (1, 1, 3, 4, 4, 4, 4).
On the other hand, we have the representation Now, for any α ∈ I with |α| = n, we obtain the identity x α (t) = n! √ α! (n);t ξ n t n ; t e α t n dt n , (4.11) by (2.11). Since where t n j is obtained from t n by omitting t j and α − (·) denotes the diminished multiindex as defined in (2.10), we deduce that with t 0 := 0 and t n+1 := t, by changing the order of integration. Then for any i n = l ≥ 1 we integrate by parts to deduce tj+1 tj−1 ξ n t n ; t e l (t j )dt j = ξ n t n ; t E l (t j ) tj =tj+1 where Now, we use substitution in (4.12) by renaming t n j in the following way for each j: With s i = t i for all i ≤ j − 1 and s i = t i+1 for all i > j − 1, we have s n−1 := t n j by setting s 0 = 0 and s n = t. Moreover, we denote with s n−1,r , r = 1, . . . , n − 1, the set that is generated from s n−1 by taking s r twice. To finalize this notation, we set s n−1,0 = (s 0 , s 1 , . . . , s n−1 ) and s n−1,n = (s 1 , . . . , s n−1 , s n ). Then tj =tj−1 = ξ n s n−1,j E l (s j )−ξ n s n−1,j−1 E l (s j−1 ), j = 1, . . . , n.
Now, using the Cauchy-Schwarz inequality we deduce that ψ 2 l s n−1 ; t ≤ (n + 1) ξ 2 n s n−1,n E 2 l (t) Recall the identity ξ n (·; t) = (n!) −1 E[D n X t ]. Using similar arguments as in the proof of Proposition 4.1, we obtain the inequality ∂ ∂t i ξ n (t 1 , . . . , t n ; t) Since |s j − s j−1 | ≤ t, we finally conclude that (4.14) Consequently, we deduce |α|=n in=l Putting things together we conclude Combining (4.9) and (4.15) we obtain the result of Theorem 3.3.

Numerical results and sparse truncation
To get an idea of the performance of the polynomial chaos expansion for different bases and truncations, we apply the trigonometric basis from Example 3.4 (i) and the Haar basis from Example 3.4 (ii) to the geometric Brownian motion from Example 3.2, on the horizon t ∈ [0, 1] with µ = σ = x 0 = 1. In this setting we have the identity which can be used to compare the absolute errors of approximated variances using different bases and values for k and p. We used the ode45 solver with standard settings within Matlab 2017b (64bit, single core) on an Intel Core i5-6300U (2.4GHz, 8GB RAM). Figure 1 shows the absolute errors between the variances of the analytical solution and different polynomial chaos expansions. One observes the fast decay of the approximation error for fixed k and increasing p, and vice versa. The main difference between the observed errors is that the maximum of the trigonometric expansion is always at the end of the time horizon at t = 1, while the Haar basis leads to almost zero error on k equidistant time points, including t = 0 and t = 1. Table 1 in the Appendix lists approximation errors, numbers of coefficients, and computational times for several choices of p and k. The exponential increase in the number of coefficients with increasing p and k leads to an exponential increase in runtime, as expected. To overcome this issue, we propose a heuristic to reduce the computational burden.

Sparse truncation
The information contained in the coefficient functions x α (·) decays with increasing order p of the basis polynomials Ψ α and the decaying rate of the Gaussian expansion, i.e., the index k of the basis polynomials e k used for constructing Ψ α [22].
Hence, we can define sparse index sets for reducing the number of multi-indices α from the full truncated index set I p,k without losing too much information, i.e., accuracy in a numerical simulation.
Definition 1 (Sparse truncation of first order). Let p be the maximum order of the index α ∈ I p,k . Then the first order sparse index r = (r 1 , . . . , r k ) satisfies p = r 1 ≥ r 2 ≥ · · · ≥ r k and we define the first order sparse index set Example 5.1. Let k = 5 and p = 3. Then a possible choice of the sparse index is r = (3, 2, 2, 1, 1). For constructing the first order polynomials all five first order basis polynomials can be used. The second order polynomials are comprised by all possible combinations of first order basis polynomials depending on e 1 , . . . , e 5 and the second order basis polynomials of e 1 , e 2 , e 3 . Analogously, the third order polynomials are constructed. By using this first order sparse index set I r p,k the number of coefficient functions x α (·) appearing within the propagator system (3.2) can be reduced drastically without impairing the solution much. The full index set I p,k consists of (k+p)! k!p! = 56 terms, whereas this first order sparse index set includes 42 terms.
An even more rigorous reduction of the number of coefficients included in the propagator system can be achieved by using a second order sparse index, i.e., a series of first order sparse indices (r j ) j=0,...,p that depend on the actual order of the polynomials Ψ α with j = |α|, [22]. This approach allows to exclude crossing products of random variables W (e i ) from the construction of higher order basis polynomials Ψ α that add only negligible information to the system. Definition 2 (Sparse truncation of second order). Let p be the maximum order of the index α ∈ I p,k . Then the second order sparse index (r) = (r j ) j≤p is a series of first order sparse indices r j = (r j 1 , . . . , r j k ) satisfying j = r j 1 ≥ r j 2 ≥ · · · ≥ r j k for all j = |α| ≤ p and we define the second order sparse index set Example 5.2. Considering again the setting of the previous example with k = 5 and p = 3, one possible choice of a second order sparse index is given via r 1 = (1, 1, 1, 1, 1), r 2 = (2, 2, 2, 1, 0), and r 3 = (3, 2, 0, 0, 0). In constructing basis polynomials of order |α| = 3 we can use all combinations of basis polynomials depending on the first two random variables e 1 and e 2 up to orders 3 and 2, respectively. Thus, these are √ 6H 3 (e 1 ), √ 2H 2 (e 1 )H 1 (e 2 ), and √ 2H 1 (e 1 )H 2 (e 2 ), compare [22]. Table 2 in the Appendix lists the first and second order sparse indices that were used for the numerical study. The results in Table 1 show that at the price of a slightly higher error, and of course the loss of a guaranteed upper bound as in (3.6), the computational times could be reduced by several orders of magnitude.