Large deviation principle for one-dimensional SDEs with discontinuous coefficients

We establish the large deviation principle for solutions of one-dimensional SDEs with discontinuous coefficients. The main statement is formulated in a form similar to the classical Wentzel--Freidlin theorem, but under the considerably weaker assumption that the coefficients have no discontinuities of the second kind.


Introduction and the main result
This paper aims at the large deviation principle (LDP) for the solutions to the SDEs with possibly discontinuous coefficients a, σ. Recall that a family of (the distributions of) random elements {X ε } taking values in a Polish space X is said to satisfy the LDP with rate function I : X → [0, ∞] and speed function r : for each closed F ⊂ X and for each open G ⊂ X. The rate function is assumed to be lower semicontinuous; that is, all level sets {x : I(x) ≤ c}, c ≥ 0, are closed. If all level sets are compact, then the rate function is called good. We assume that, for some C, c > 0, It is well known that, in this case, the SDE (1) has a unique weak solution, which can be obtained by a proper combination of the time change transformation of a Wiener process and the Girsanov transformation of the measure; see [10], IV, §4. In what follows, we fix T > 0, interpret the (weak) solution X ε = {X ε t , t ∈ [0, T ]} to (1) as a random element in C(0, T ), and prove the LDP for the family {X ε }. Since the law of X ε does not depend on a possible change of the sign of σ, in what follows, we assume without loss of generality that σ > 0.
Our principal regularity assumptions on the coefficients a, σ is that they have no discontinuities of the second kind, that is, they have left-and right-hand limits at every point x ∈ R. For a given pair of such functions a, σ, we define the modified functionsā,σ as follows: (i) if a(x−) ≥ 0 and a(x+) ≤ 0, thenā(x) = 0 andσ(x) = σ(x); (ii) otherwise,ā(x),σ(x) equal eitherā(x−),σ(x−) orā(x+),σ(x+) with the choice made in such a way that Denote by AC (0, T ) the class of absolutely continuous functions φ : [0, T ] → R, and for each f ∈ AC (0, T ), we denote byḟ its derivative, which is well defined for almost all t ∈ [0, T ]. Theorem 1. Let a, σ satisfy (4) and have no discontinuities of the second kind. Then the family of distributions in C(0, T ) of the solutions to SDEs (1) satisfies the LDP with the speed function r(ε) = ε 2 and the good rate function equal to for f ∈ AC (0, T ) with f 0 = x 0 and I(f ) = ∞ otherwise.
Theorem 1 has the form very similar to the classical Wentzel-Freidlin theorem ( [9], Chapter 3, §2), which establishes LDP in a much more general setting, for multidimensional Markov processes that may contain both diffusive and jump parts. However, the Wentzel-Freidlin approach substantially exploits the continuity of infinitesimal characteristics of the process. The natural question arises: to which extent the continuity assumption can be relaxed in this theory. In [5,6], the LDP was established for multidimensional diffusions with unit diffusion matrix and drift coefficients discontinuous along a given hyperplane; see also [1,2,4] for some other results in this direction. In [11], this result, in the one-dimensional setting, is extended to the case of piecewise smooth drift and diffusion coefficients with one common discontinuity point. The technique in the aforementioned papers is based on the analysis of the joint distribution of the process itself and its occupation time in the half-space above the discontinuity point (surface) and is hardly applicable when the structure of the discontinuity sets for the coefficients is more complicated. In [13], the LDP for a one-dimensional SDE with zero drift coefficient was established under a very mild regularity condition on σ: for the latter, it was only assumed that its discontinuity set has zero Lebesgue measure. Extension of this result to the case of nonzero drift coefficient is far from being trivial. In [14], such an extension was provided, but the assumption therein that a/σ 2 possesses a bounded derivative is definitely too restrictive. In this paper, we summarize the studies from [13] and [14]; note that the assumption on σ in the current paper is slightly stronger than in [13].
We note that our main result, Theorem 1, well illustrates the relation of the LDP with discontinuous coefficients to the classical Wentzel-Freidlin theory: the rate function in this theorem is given in a classical form, but with the properly modified coefficients. The heuristics of this modification is clearly seen. Namely, thanks to (ii), the rate functional I is lower semicontinuous; see Section 2.2. Assertion (i) corresponds to the fact that, in the case a(x−) ≥ 0 and a(x+) ≤ 0, the family X ε with X ε 0 = x weakly converges to the constant function equal to x. We interpret the limiting function as the solution to the ODEẋ t =ā(x t ), and note that a similar ODE for a may fail to have a solution at all.

Exponential tightness, contraction principle
Recall that a family {X ε } is called exponentially tight with the speed function r(ε) if for each Q > 0, there exists a compact set K ⊂ X such that For an exponentially tight family, the LDP is equivalent to the weak LDP; the latter by definition means that the upper bound (2) holds for all compact sets F , whereas the lower bound (3) still holds for all open sets G. An equivalent formulation of the weak LDP is the following: for each x ∈ X, where B δ (x) denotes the open ball with center x and radius δ.
To prove (5), we will use a certain extension of the contraction principle, which in its classical form (e.g., [8], Section 3.1, and [7], Section 4.2.1) states the LDP for a family X ε = F (Y ε ), where Y ε is a family of random elements in a Polish space Y that satisfies an LDP with a good rate function J, and F : Y → X is a continuous mapping. The rate function for X ε in this case has the form In the sequel, we use two different representations of our particular family X ε as an image of certain family whose LDP is well understood; however, the functions F in these representations fail to be continuous. Within such a framework, the following general lemma appears quite useful. We denote by ρ X , ρ Y the metrics in X, Y and by Λ F the set of continuity points of a mapping F : Y → X. Note that Λ F is Borel measurable; see Appendix II in [3].

Lemma 1.
Let family Y ε satisfy the LDP with speed function r(ε) and rate function J. Assume also that Then, for any x ∈ X, with Thus, the upper bound in the LDP for {Y ε } gives whereΘ δ (x) denotes the closure of Θ δ (x). SinceΘ δ (x) ⊂ Ξ γ,δ (x) for any γ > 0, this provides (6). The proof of (7) is even simpler: for any y ∈ Θ δ (x), there exists r > 0 such that the image of the ball B r (y) under F is contained in B δ (x), which yields Lemma 1 is a simplified and more precise version of Lemma 4 in [12]. The functions I upper , I lower are lower semicontinuous: we can show this easily using that, for any sequence x n → x, the sets Θ δ/2 (x n ) are embedded into Θ δ (x) for n large enough (see, e.g., Proposition 3 in [12]). In fact, Lemma 1 says that for an arbitrary image of a family {Y ε }, one part of an LDP (the upper bound) holds with one rate function, whereas the other part (the lower bound) holds with another rate function. This is our reason to call (6) and (7) the upper and the lower semicontraction principles. To prove (5), it suffices to verify the inequalities We refer to [12] for a more discussion and an example where the pair of semicontraction principles do not provide an LDP.

Lower semicontinuity of I
In this section, we prove directly that the functional I specified in Theorem 1 is lower semicontinuous, that is, it is indeed a rate functional. This will explain the particular choice of the modified functionsā,σ. In addition, this will simplify the proofs, where we will use the representation for I(x) presented further. Define dz, x ∈ R.
Then I(f ), if it is finite, can be represented as The function S is continuous; hence, the functional I 2 is just continuous. The function a 2 /σ 2 is lower semicontinuous by the choice ofā,σ; thus, the functional I 3 is lower semicontinuous. Finally, we can represent I 1 in the form is continuous, and the functional is known to be lower semicontinuous (this is just the rate functional for the family {εW }). Hence, I 1 is lower semicontinuous, which completes the proof of the statement.

Exponential tightness and the weak LDP
In this section, we prove that the family {X ε } is exponentially tight with the speed function r(ε) = ε 2 . Note that is a continuous martingale with the quadratic characteristics see (4). Recall that M ε can be represented as a Wiener process with the time change t → M ε t ; see, for example, [10], II. §7. Then, for each R, On the other hand, for each ω ∈ Ω such that ε|M ε t (ω)| > R, the corresponding trajectory of X ε satisfies and therefore, by the Gronwall inequality, Therefore, for any Q > 0, there exists R such that Next, recall the Arzelà-Ascoli theorem: for a closed set K ⊂ C(0, T ) to be compact, it is necessary and sufficient that it is bounded and equicontinuous. The family εM ε is represented as a time changed family εW ε , where each W ε is a Wiener process, and the derivative of M ε t is bounded by C. Using these observations, it is easy to deduce the exponential tightness for {εM ε } using the well-known fact that the family {εW } is exponentially tight. On the other hand, for any ω such that the trajectory of X ε t is bounded by R, the corresponding trajectory of the process X ε t − εM ε t satisfies the Lipschitz condition w.r.t. t with the constant C(1 + R). Combined with the previous calculation, this easily yields the required exponential tightness.
In what follows, we proceed with the proof of (5). Since now the state space X = C(0, T ) is specified, we change the notation and denotes the points in this space by f, g, . . . . Since the set B 1 (f ) is bounded, the law of X ε restricted to any B δ (f ) does not change if we change the coefficients a, σ on the intervals (−∞, −R], [R, ∞) with R > 0 large enough. Hence, we furthermore assume the coefficients a, σ to be constant on such intervals for some R.

Case I. Piecewise constant a, σ with one discontinuity point
We proceed with the further proof in a step-by-step way, increasing gradually the classes of the coefficients a, σ for which the corresponding LDP is proved. First, let a, σ be constant on the intervals (−∞, z) and (z, ∞) with some z ∈ R. Without loss of generality, we can assume that z = 0. Then we can use Theorem 2.2 [11], where the LDP with the speed function r(ε) = ε 2 is established for the pair (X ε , Z ε ) with The corresponding rate function in [11] is given in the following form. Denote a ± = a(0±), σ ± = σ(0±) and define the class Then the rate functional for (X ε , Z ε ) equals and, for all other pairs, I(f, ψ) = ∞. From this result, using the contraction principle (see Section 2), we easily derive the LDP for X ε with the rate function Now only a minor analysis is required to show that this rate function actually coincides with that specified in Theorem 1. First, we observe that This is obvious if either a − /σ 2 − ≤ a + /σ 2 + or a − > 0, a + < 0. In the case where a − /σ 2 − > a + /σ 2 + and a − , a + have the same sign, we can verify directly that L ′ z (0, y, z) have the same sign for z ∈ [0, 1], which completes the proof of the required identity.
We will use repeatedly the following fact, which follows easily from the changeof-variables formula: for any f ∈ AC (0, T ) and any set A ⊂ R with zero Lebesgue measure, the Lebesgue measure of the set see, for example, Lemma 1 in [13]. Applying (10) with A = {0}, we conclude that in the above expression for I(f ), the function L can be changed to which completes the proof of Theorem 1 in this case.

Case II. Piecewise constant a, σ
Let, for some z 1 < · · · < z m , the functions a, σ be constant on the intervals (−∞, z 1 ), . . , m}, which does not restrict the generality of the construction given further, and define the functions a k , σ k , k = 0, . . . , m by Consider a family of independent processes Y 0,ε , Y n,k,ε , k = 1, . . . , m, n ≥ 1, such that Y 0,ε solves SDE (1) with the coefficients a 0 , σ 0 and each Y n,k,ε solves a similar SDE with the coefficients a k , σ k and the initial value z k . Define iteratively the process X ε in the following way: putX ε equal Y 0,ε until the time moment until the first time moment τ 2 when this process hits {z k , k = 1, . . . , m} \ {z κ1 }. Iterating this procedure, we get a processX ε t with It follows from the strong Markov property of X ε thatX ε has the same law with X ε . Hence, the given construction in fact represents the law of X ε as the image of the joint law family of independent processes Y 0,ε , Y n,k,ε , k = 1, . . . , m, n ≥ 1.
Each of these processes is a solution to (1) with corresponding coefficients having at most one discontinuity point; hence, the LDP for them is provided in the previous section. Our idea is to deduce the LDP X ε via a version of the contraction principle. With this idea in mind, we first perform a simplification of the above representation.
For any fixed f ∈ C(0, T ), we can choose δ f > 0 small enough and N f large enough so that each g ∈ B δ (f ) has less than N ∆-oscillations on [0, T ]. Hence, if in the above construction, N is taken equal to N f , then the restriction of the law of X ε to any ball B δ (f ), δ ≤ δ f , equals to the same restriction of the image of the joint law of the finite family Y 0,ε , Y n,k,ε , k = 1, . . . , m, n = 1, . . . , N , under the mapping F specified before. We aim to verify (5), and we argue in the following way. We fix f and choose N = N f as before, so that the laws of X ε , restricted to B δ (f ) for δ small, can be obtained as the image under F specified before. Then we prove (8) at this particular point x = f , with I lower , I upper being constructed by this particular F . This yields the required weak LDP (5).
Within such an argument, we have to treat for any N the image under the corresponding F of the family of laws in Y = C(0, T ) 1+mN , which, according to the result proved in the previous section, satisfies the LDP with the rate function dt for y = y 0 , y n,k k≤m,n≤N such that y 0 , y n,k ∈ AC (0, T ), y 0 0 = x 0 , y n,k 0 = z k and J(y) = ∞ otherwise. To apply Lemma 1 in this setting, we first analyze the structure and the properties of the corresponding F .
Using the first fact, now it is easy to prove the second inequality in (8). If it fails, then for a given f , there exists a sequence {y l } such that F (y l ) → f and J(y l ) ≤ c < I(f ). Since the level set {y : J(y) ≤ c} is compact, we can assume without loss of generality that y l converge to some y; recall that J is lower semicontinuous and thus J(y) ≤ c. The function f possesses the above patchwork representation with the trajectories taken from y, some pasting points τ * 1 , . . . , τ * r , and some numbers κ * 1 , . . . , κ * r . From this representation it is clear that f ∈ AC (0, T ) and f 0 = x 0 : if this fails, then the same properties fail at least for one trajectory from the family y and thus J(y) = ∞, which contradicts to J(y) ≤ c. Hence, we have where we put τ * 0 = 0, τ * r+1 = T . Let x 0 be located on some interval (z k−1 , z k ), k = 2, . . . , m, say, x 0 ∈ (z 1 , z 2 ). Then, on the interval (0, τ * 1 ), the trajectory f is contained in the segment [z 1 , z 2 ]. The functions a 0 , σ 0 are constant and coincide with a,σ on (z 1 , z 2 ). In addition, a 0 = a(z 1 +) = a(z 2 −), σ 0 = σ(z 1 +) = σ(z 2 −); hence, by the choice ofā,σ we havē Then by (10) .
This gives a contradiction with inequalities J(y) ≤ c and I(f ) > c, which completes the proof of the second inequality in (8). The first inequality in (8) holds immediately for f such that I(f ) = ∞. We fix f with I(f ) < ∞ and γ > 0 and construct y γ such that F (y γ ) = f , the functions τ 1 (·), τ 2 (·), . . . are continuous at y γ , and J(y γ ) ≤ I(f )+γ. This completes the proof of (8).
The construction explained gives a cue for the choice of y = y γ (we omit the index γ to simplify the notation). We put y 0 equal to f until its first time moment τ * 1 of hitting the set {z 1 , z 2 } (we still assume that x 0 ∈ (z 1 , z 2 )). Then we extend y 0 to the entire time interval [0, T ], and we aim to make the integral small enough; that is, to make small the error in the second inequality in (11), which arises because of the integral of y 0 . If we put y 0 t = y τ * 1 + a 0 (t − τ * 1 ), then we obtain the trajectory at which the integral (12) equals zero; we call such a trajectory a zero-energy one. However, under such a choice, we may fail with the other our requirement that τ 1 (·) should be continuous at the point y. It is easy to verify that for such a continuity, it suffices that y 0 , if hitting {z 1 , z 2 } at a point, say, z 1 at every interval (τ * 1 , τ * 1 + δ), δ > 0, takes values both from (−∞, z 1 ) and (z 1 , ∞). We can perturb the zero-energy trajectory introduced above on a small time interval near τ * 1 in such a way that this new trajectory possesses the continuity property explained before, and the integral (12) is ≤ γ/N . Then we iterate this procedure. Observe that, for any k, by the construction of the functionā k there exists at least one corresponding zero-energy trajectory with the initial value z k , which now is defined as a solution to the ODĖ g t =ā k (g t ), t > 0, g 0 = z k .
We have κ * 1 uniquely determined by the trajectory f (in fact, by the part of this trajectory up to time τ * 1 ). For k = κ * 1 , we define y k,1 as the zero-energy trajectory on [0, T ] that starts from x k and corresponds to the coefficientā k . All these trajectories are "phantom" in the sense that they neither are involved into the representation of f through y nor give an impact into J(y). For k = κ * 1 , we define y k,1 similarly as before: it equals f t+τ * 1 for t ≤ τ * 2 − τ * 1 , and afterwards it is defined as a perturbation of a zero-energy trajectory that makes τ 2 (·) continuous in y and Repeating this construction ≤ N times, we finally get the required function y = y γ . This completes the proof of (8) and thus of (5). Together with the exponential tightness proved in Section 3.1, this completes the proof of the LDP in this case.

Case III. Piecewise constant a/σ 2 , general σ
In this section, we remove the assumption on a, σ to be piecewise constant, still keeping this assumption for a/σ 2 ; we also assume that a, σ are constant on (−∞, R] and [R, ∞) for some R. Our basic idea is to represent {X ε } as the image under a time changing transformation of a family {Y ε } and then to use the semicontraction principles. The same approach was used in [13], where the LDP was established for a solution of (1) with a ≡ 0; in this case, Y ε was taken in the form Y ε t = x 0 + εW t . In our current setting, the choice of the coefficients for the SDE that defines Y ε should take into account the common discontinuity points for a/σ and σ. This becomes visible both from an analysis of the proof of Theorem 1 in [13] and from the definition of the functionsā,σ, which combines the left-and right-hand values of both a and σ at the discontinuity points. The proper choice of the family is explained below. Some parts of the arguments are similar to those in [13]. We omit detailed proofs whenever it is possible to give a reference to [13] and focus on the particularly new points.
We assume a/σ 2 to be piecewise constant with discontinuity points z 1 < · · · < z m and put (with the convention ∅ = 1) Under such a choice,σ = συ, and thus the functionã/σ 2 equals a/σ 2 and is constant on each of the intervals (−∞, z 1 ), . . . , (z m , ∞). By construction,σ is constant on these intervals as well; hence,ã,σ fit the case studied in the previous section, and the required LDP holds for the family Y ε of the solutions to (1) with these coefficients and Y ε 0 = x 0 . This construction yields also the following property, which will be important below: the function a = (a/σ 2 )σ 2 does not change its sign on each of the intervals (−∞, z 1 ), . . . , (z m , ∞). Hence, denoting B = a 2 /σ 2 andB =ā 2 /σ 2 , we getB Fix ε > 0 and define , and X ε t = Y ε τt . Then X ε is a weak solution to (1) with X ε 0 = x 0 ; see [10], IV §7. In the above construction, η t ≥ c 2 t and thus τ t ≤ c −2 t; see (4). We putT = c −2 T , Y = C(0,T ), and define Y ε as a family of solutions to (1) with the coefficients a,σ and the time horizonT . Then the family X ε possesses a representation X ε = F (Y ε ) with the mapping F : Y → X defined by Observe that for F to be continuous at a point y ∈ Y, it suffices that y spends zero time in the set ∆ υ of the discontinuity points of the function υ; see [13], Lemma 1 and Corollary 1. Now ∆ υ ⊂ ∆ a ∪ ∆ σ is at most countable, and it is easy to see that the continuity set Λ F has probability 1 w.r.t. the distribution of each Y ε , that is, we can apply Lemma 1. Our further aim is to prove (8) in the above setting, which then would imply (5) and thus prove the LDP. The general idea of the proof is similar to that of Theorem 1 in [13], though particular technicalities differ substantially.
First, for a given f ∈ X, we describe explicitly the set F −1 ({f }). We put here we changed the variables r = τ s (y) and used that ds. Therefore, Then we conclude that that is, for any y ∈ F −1 ({f }), the part of its trajectory with t ≤ ζ T (f ) is uniquely defined. On the other hand, it is easy to show that any y ∈ Y satisfying (14) belongs to F −1 ({f }). Next, we denote byâ,σ the modified coefficients, which correspond to the coefficientsã,σ in the sense explained in Section 1. Sinceã = aυ 2 andσ = συ, we easily see thatâ at every continuity point x for υ. Then, for any y ∈ AC (0,T ) with y 0 = x 0 that spends zero time in the set ∆ υ , we have On the other hand, using (14) and making the time change s = π t (f ) with f = F (y), we get 1 2 ζT (F (y)) 0 (ẏ t −ā(y t )υ 2 (y t )) 2 σ 2 (y t )υ 2 (y t ) dt because now t = ζ s (f ) = τ s (y) and thus dt = ds υ 2 (y t ) ,ẏ t υ −2 (y t ) = (y τs(y) ) ′ s =ḟ s .
Thus, J(y) = I F (y) + J tail (y) with Now we are ready to proceed with the proof of the first inequality in (8).
We consider only f such that I(f ) < ∞; otherwise, the required inequality is trivial. Let us fix a function y corresponding to f by the following convention: it is given by identity (14) up to the time moment t = ζ T (f ) and follows a zero-energy trajectory afterward, that is, satisfiesẏ a.e. w.r.t. to the Lebesgue measure. We note that at least one such zero-energy trajectory exists (it may be nonunique, and in this case, we just fix one of such trajectories). Indeed, by construction,ã is piecewise constant, so that the correspondinĝ a is piecewise constant as well. The proper choice ofâ(z k ) at those points z k wherẽ a(z k −) > 0,ã(z k +) < 0 yields that the above ODE, which determines a zero-energy trajectory, admits at least one solution.
If f spends zero time in the set ∆ υ of discontinuity points for υ, then the same property holds for the corresponding y constructed above. Indeed, the first part of the trajectory y is just the time-changed trajectory f , and the second part is a zeroenergy trajectory. The latter trajectory is piecewise linear, and we can separate a finite set of time intervals where it either (a) moves with a constant speed = 0 (and thus spends a zero time in the set ∆ υ , which has zero Lebesgue measure) or (b) stays constant (in this case, it equals z k for some k, and, by construction, υ is continuous at {z k }). Hence, we conclude that (16) holds and, moreover, J tail (y) = 0, that is, J(y) = I(f ). In addition, y ∈ Λ F , which gives for this f the required inequality For a general f , we will show that, for each δ > 0, there exists f δ such that f δ ∈ B δ (f ), I(f δ ) ≤ I(f ) + δ, and f δ spends zero time in ∆ υ ; since I lower is known to be lower semicontinuous, this will complete the proof of the first inequality in (8). Recall the decomposition I = I 1 + I 2 + I 3 from Section 2.2 and note that Hence, our aim is to construct a function f δ that is close to f both in the uniform distance and in d Σ , spends zero time in the set ∆ υ , and We decompose the time set into a disjoint union of open intervals and modify the function x on each of these intervals. On the complement to this union, the function f δ will remain the same; note that υ is continuous at every point z k , and hence in order to get a function that spends zero time in ∆ υ , it suffices to modify f on Q only. In what follows, we fix an interval (α, β) from the decomposition of the set Q and describe the way to modify f on (α, β). The construction below is mostly motivated by (13). We fix some γ > 0 and choose a finite partition {u j } of the set {f t , t ∈ [α, β]} such that the oscillation of the functionσ 2 on each interval (u j−1 , u j ) does not exceed γ. Then there exists a finite partition α = t 0 < · · · < t m = β such that, on each time segment [t i−1 , t i ], the function x visits at most one point from the set {u j }. Then, on each interval [t i−1 , t i ], we consider the family and s i is defined by the following convention: s i = +1 ifB is right-continuous at the (unique) point from the set {u j } that is visited by f on [t i−1 , t i ] or if f does not visit this set; otherwise, s i = −1. If, in addition,φ i t = 0 a.e., then for all κ > 0 except at most countable set of points, we have that f i,κ spends zero time in ∆ υ on the time interval; see [13], Lemma 2. The choice of the sign s i yields that, for κ > 0 small enough, Then κ > 0 can be chosen small enough and the same for all intervals [t i−1 , t i ], so that the corresponding functionf κ , which coincides with It is also easy to see that, in addition, the following inequalities can be guaranteed by the choice of (small) κ: Repeating the same construction on each interval from the partition for Q, we get a functionf such that andf spends zero time in ∆ υ . Taking in this construction γ > 0 small enough, we obtain the required function f δ =f , which completes the proof of (17).
Recall that υ is continuous at each point z k ; hence, by (15) identity (18) holds for all z ∈ R. Now we are ready to proceed with the proof of the second inequality in (8).
Then {y n } belongs to some level set {J(y) ≤ c} of a good rate function J. Hence, passing to a subsequence, we can assume that both {y n } and {ỹ n } converge to some y ∈ Y. In addition, J(y) ≤ lim inf n J(y n ) < I(f ). Next, denote τ n t = τ t ỹ n , where τ (·) is the function introduced in the definition of F . Then each τ n ∈ AC (0, T ) with its derivative taking values from [C −2 , c −2 ]; see (4). This allows us, passing to a subsequence, assume that there exists a uniform limit τ = lim n τ n and thatτ n →τ weakly in L 2 (0, T ).
(21) Now we will use (20) in order to compare J i , i = 1, 2, 3, with I i (f ), i = 1, 2, 3. We have directly that J 2 = I 2 (f ). Next, we change the variables s = τ t , and get Recall that we assumedτ to be the L 2 -weak limit oḟ .
Then by (18) we get Finally, changing the variables s = τ t , we get Denote Q = {t ∈ [0, T ] : f t ∈ ∆ υ } and recall that because ∆ υ has zero Lebesgue measure,ḟ t = 0 for a.a. t ∈ Q. On the other hand, if f t ∈ ∆ υ , then by (15) and (22) we have 1 σ 2 (f t )τ t = 1 σ 2 (f t ) ; thus, Summarizing the above, we get J(y) ≥ I(f ), which contradicts to the assumption made at the beginning of the proof.