Large deviations of regression parameter estimator in continuous-time models with sub-Gaussian noise

A continuous-time regression model with a jointly strictly sub-Gaussian random noise is considered in the paper. Upper exponential bounds for probabilities of large deviations of the least squares estimator for the regression parameter are obtained.


Introduction
Theory of large deviations in mathematical statistics and statistics of stochastic processes deals with the asymptotic behaviour of tails of distribution functions of parametric and nonparametric statistical estimators. Concerning parametric estimators it is necessary to refer to the monograph of Ibragimov and Has'minskii [6] where the exponential convergence rate of probabilities of large deviations for maximum likelihood estimator was obtained. This result led to the appearance of a large number of publications on the subject of large deviations of statistical estimators.
Further we will speak about least squares estimators (l.s.e.'s) for parameters of a nonlinear regression model. In the paper of Ivanov [8] a statement was proved on the power decreasing rate for probabilities of large deviations of l.s.e. for a scalar parameter in the nonlinear regression model with i.i.d. observation errors having moments of finite order. Prakasa Rao [13] obtained a similar result with the exponential decreasing rate in the Gaussian nonlinear regression.
In the paper of Sieders and Dzhaparidze [15] a general Theorem 2.1 on probabilities of large deviations for M -estimators based on a data set of any structure was proved that generalizes the mentioned result in [6] with an application to l.s.e. for parameters of the nonlinear regression with pre-Gaussian and sub-Gaussian i.i.d. observation errors (Theorems 3.1 and 3.2 in [15]). Some results in this direction are obtained by Ivanov [9].
The results on probabilities of large deviations of an l.s.e. in a nonlinear regression model with correlated observations can be found in the works of Ivanov and Leonenko [11], Prakasa Rao [14], Hu [4], Yang and Hu [16], Huang et al. [5].
Upper exponential bounds for probabilities of large deviations of an l.s.e. for a parameter of the nonlinear regression in discrete-time models with a jointly strictly sub-Gaussian (j.s.s.-G.) random noise were obtained in Ivanov [10]. In the present paper we extend some results of [10] to continuous-time observation models. Consider a regression model where a(t, τ ), (t, τ ) ∈ R + × Θ c , is a continuous function, a true parameter value θ = (θ 1 , . . . , θ q ) ′ belongs to an open bounded convex set Θ ⊂ R q and a random noise ε = {ε(t), t ∈ R} satisfies the following condition. N1. ε is a mean-square and almost sure (a.s.) continuous stochastic process defined on the probability space (Ω, F, P ), Eε(t) = 0, t ∈ R.
We shall write = T 0 . Definition 1. Any random vector θ T = (θ 1T , . . . , θ qT ) ′ ∈ Θ c having the property is said to be the l.s.e. for an unknown parameter θ, obtained by the observations Under assumptions introduced above there exists at least one such random vector θ T [12].
In the asymptotic theory of nonlinear regression in the problem of normal approximation of the distribution of an l.s.e., the difference θ T − θ is normed by diagonal matrix [11] Further it is supposed that the function a(t, ·) ∈ C 1 (Θ) for any t ≥ 0. The paper is organized in the following way. In Section 2 an upper exponential bound is obtained for large deviations of d T (θ)(θ T − θ) in the regression model (1) with a j.s.s.-G. random noise ε. In Section 3 the results of Section 2 are applied to a stationary j.s.s.-G. noise ε. Section 4 contains examples of regression functions a and noise ε satisfying the conditions of our theorems.
2 Large deviations in models with a jointly strictly sub-Gaussian noise Definition 2. A random vector ξ = (ξ 1 , . . . , ξ n ) ′ ∈ R n is called strictly sub- Note that we obtain from Definition 2 the definition of an s.s.-G. random variable (r.v.) ξ taking n = 1.
These definitions and a more detailed information on sub-Gaussian r.v.'s, vectors and stochastic processes can be found in the monograph [1] by Buldygin and Kozachenko.
Concerning the random noise ε in the model (1) we introduce the following assumption.
(ii) For any for some constant d 0 > 0, ∆ T = ( ∆ 2 (t)dt) Then using condition 1) and the Fubini theorem we get and we can take d 0 = b 1 . On the other hand, and we can take Let ∆(t), t ∈ R + , be a continuous function. Then condition N1 implies the existence of the integral determined for almost all paths of the process ε(t), t ∈ [0, T ], as the Riemann integral. Consider partitions r(n) as n → ∞, and the corresponding integral sums Then It is obvious also that is an s.s.-G. r.v., for any T > 0.

Lemma 2.
Under conditions N1 and N2 for any T > 0, x > 0, where Proof. The proof is obvious (see, for example, [1]). For any x > 0, λ > 0 by the Chebyshev-Markov inequality, (2), and Lemma 1 Minimization of the right-hand side of (7) in λ gives the first inequality in (6). The proof of the second inequality is similar. The third inequality follows from the previous ones.
We need some notation to formulate conditions on the regression function a(t, θ) using the approach of the paper [15] (see also [9,11]). Write Denote by G the family of all functions g = g T (R), T > 0, R > 0, having the following properties: 1) for fixed T , g T (R) ↑ ∞, as R → ∞; 2) for any r > 0, Let γ(R) be polynomials of R (possibly different) with coefficients that do not depend on values T, θ, u, v. Set also Assume the existence of a function g ∈ G, constants δ ∈ (0, 1 2 ), κ > 0, ρ ∈ (0, 1] and polynomials γ(R) such that for sufficiently large T, R (we will write T > T 0 , R > R 0 ) the following conditions are fulfilled.
Theorem 1. If conditions N1, N2, R1 and R2 are fulfilled, then there exist constants where for any β > 0 the constant B 0 can be chosen so that Proof. Set To prove the theorem it is sufficient to check the fulfilment of assumptions (M1) and (M2) of the Theorem 2.1 in [15], reformulated in the manner similar to that used in the proof of Theorem 3.1, ibid.: From the first inequality in (6) of Lemma 2 for ∆(t) = ∆(t, u), x = δΦ T (u, 0), condition R2, taking into account that ζ T (0) = 0 in our particular case, we obtain i.e. (13) is true.
On the other hand, according to R1 (polynomials γ(R) are different in the last two lines). Thus we obtain the bound By the formula for the moments of nonnegative r.v. (see, for example, [2] and compare with [4]) and the third inequality of Lemma 2 being applied to where I(T ; u, v) = I(T, u) − I(T, v), Z is a standard Gaussian r.v., Relations (16), (17), and (8) lead to the bound From (14), (15), and (18) it follows (12).
Suppose there exist a diagonal matrix s T = diag(s iT , i = 1, q) with elements that do not depend on τ ∈ Θ, and constants 0 < c i < c i < ∞, i = 1, q, such that uniformly in τ ∈ Θ for T > T 0 Then instead of the matrix d T (θ) (at least in the framework of the topic of this paper) it is possible to consider, without loss of generality, the normalizing matrix s T . The next condition is more restrictive than R1 and R2, however it is simpler due to requirement (19).
R3. There exist numbers 0 < c 0 < c 1 < ∞ such that for any u, v ∈ U T (θ) = s T (Θ c − θ) and T > T 0 , It goes without saying that in the expression for Φ T (u, v) in (20) we use the matrix s −1 T instead of d −1 T (θ). A condition of the type (20) has been introduced in [8] and used in [13,15,4] and other works. The next theorem generalizes Theorem 3.2 from [15]. Theorem 2. Under conditions N1, N2 and R3 there exist constants B, b > 0 such that for T > T 0 , R > R 0 moreover for any β > 0 the constant B can be chosen so that Proof. We will show that R3 implies conditions R1 and R2. Inequality (8) of the condition R1(i) follows from the right-hand side of inequality (20), if we take ρ = 1, γ(R) = c 1 . Inequality of the condition R1(ii) follows as well from the right-hand side of (20), if we take v = 0, γ(R) = c 1 (R + 1) 2 .

Large deviations in the case of a stationary jointly strictly sub-Gaussian noise
We impose an additional restriction on the noise process ε. N3. The stochastic process ε is stationary with the covariance function B(t) = Eε(0)ε(t), t ∈ R, and the bounded spectral density f (λ), λ ∈ R: Under assumption N3 the following corollaries of the theorems proved in Section 2 are true. Proof. We just need to show that condition N2(ii) is fulfilled. Indeed, by the Plancherel identity, Corollary 2. Under conditions N1, N2(i), N3 and R3 the statement of Theorem 2 is true with inequality (22) rewritten in the form Our next assumption is a particularization of the requirements N2 and N3. N4(i). The random noise ε is of the form where ξ = {ξ(t), t ∈ R} is a mean-square continuous j.s.s.-G. stochastic process with orthogonal increments, Eξ(t) = 0, The stochastic integral in (24) is understood as a mean-square Stieltjes integral [3]. The process ξ is an integrated white noise, ε can be considered as a stationary process at the output of a physically realizable filter with the covariance function (see ibid.) and the spectral density Proof. Let n ≥ 1 be a fixed number and t 1 , . . . , t n , ∆ 1 , . . . , ∆ n be arbitrary real numbers. It is necessary to prove that Formula (24) can be rewritten in the form Denote ψ k (s) = ψ(t k − s), k = 1, n. Then Let a sequence of simple functions Then the sequences of random variables For any fixed m, the random vector with coordinates ε (m) k ′ are real numbers and η (m) k ′ are different real numbers. By condition N4(i) the random vector with coordinates ξ(η (m) k ′ ), k ′ = 1, n ′ (m), is s.s.-G., and therefore, From (26) it follows that ε (m) k P → ε t k , k = 1, n, as m → ∞, and thus there exists Finally, by the Fatou lemma and (27) So, we have obtained (25).  Assume lim inf Corollary 5. Under conditions of Theorem 2 or Corollaries 2, 4, and (28) for any ρ > 0, ν ∈ [0, 1 2 ), and T > T 0 Proof. To show (29) it is sufficient to take R = ρT 1 2 −ν in (21).
For ν = 0 we arrive at a quite strong result on the weak consistency of l.s.e. Similarly, in the conditions of Corollary 5 the following result on probabilities of moderate deviations for l.s.e. holds: for any h > 0, T > T 0 Obviously, Gaussian stochastic processes ε are j.s.s.-G. ones, and all the previous results are valid for them.

Two examples
In this section, we consider an example of a regression function satisfying the condition R3 and an example of the j.s.s.-G. process ξ from expression (24) in condition N4(i).
Let us assume J is a positive definite matrix. In this case the regression function a(t, τ ) satisfies condition R3. Indeed, let H = max y∈Y, τ ∈Θ c exp y, τ , L = min y∈Y, τ ∈Θ c exp y, τ Then for any δ > 0 and T > T 0 and according to (19) we can take s T = T 1 2 I q with the identity matrix I q of order q. For a fixed t and therefore for any δ > 0 and T > T 0 So we obtain the right-hand side of (20) with the constant c 1 > H 2 Tr J.
On the other hand, for a fixed t Thus for any δ > 0 and T > T 0 Moreover for any β > 0 the constant B can be chosen so that Example 4.2. Here we offer an example of the j.s.s.-G. stochastic process ξ with orthogonal increments in the formula (24) using the Ito-Nicio series (see [7] and references therein). Consider any orthonormal basis {ϕ k , k ≥ 1} in L 2 (R + ) and a sequence {Z k , k ≥ 1} of independent N (0, 1) r.v.'s. Then is a standard Wiener process with covariances Ew 0 (t)w 0 (s) = min{t, s}. We need some kind of the Wiener process on the real line R. Let {w 1 (t), t ≥ 0}, {w 2 (t), t ≥ 0} be two independent Wiener processes of the following form: and {Z ik , k ≥ 1, i = 1, 2} be independent N (0, 1) r.v.'s. Then the required Wiener process on R can be defined as w(t) = w 1 (t), t ≥ 0, and w(t) = w 2 (|t|), t < 0. For any real i.e. increments are orthogonal. On the other hand, for any t > s ξ(t) = ξ 1 (t), t ≥ 0, and ξ(t) = ξ 2 (|t|), t < 0. Then ξ = {ξ(t), t ∈ R} is a process with orthogonal increments and is not a Gaussian one.