G\"artner-Ellis condition for squared asymptotically stationary Gaussian processes

The G\"artner-Ellis condition for the square of an asymptotically stationary Gaussian process is established. The same limit holds for the conditional distri-bution given any fixed initial point, which entails weak multiplicative ergodicity. The limit is shown to be the Laplace transform of a convolution of Gamma distributions with Poisson compound of exponentials. A proof based on Wiener-Hopf factorization induces a probabilistic interpretation of the limit in terms of a regression problem.


Introduction
The convergence of the scaled cumulant generating functions of a sequence of random variables implies a large deviation principle; this is known as the Gärtner-Ellis condition [5, p. 43]. Our main result establishes that condition for the square of an asymptotically stationary Gaussian process. Reasons for studying squared Gaussian processes come from different fields: large deviation theory [16,4], time series analysis [9], or ancestry dependent branching processes [14]. Since only nonnegative real valued random variables are considered here, we shall use logarithms of Laplace transforms, instead of cumulant generating functions. For t 0, consider the following Laplace transform: Then for all α 0, with: ℓ 0 (α) = 1 4π  (1.6) where E x denotes the conditional expectation given X 0 = x. Then for all α 0 and all x ∈ R, The analogue for finite state Markov chains has long been know [5, p. 72]. It was extended to strong multiplicative ergodicity of exponentially converging Markov chains by Meyn and his co-workers: see [13]. In [12], the square of a Gauss-Markov process was studied, strong multiplicative ergodicity was proved, and the limit was explicitly computed. This motivated the present generalization.
The particular case of a centered stationary process (m(t) = 0, K(t, s) = k(t − s)) can be considered as classical: in that case, the limit (1.4) follows from Szegő's theorem on Toeplitz matrices: see [8], [3] as a general reference on Toeplitz matrices, and [1] for a review of probabilistic applications of Szegő's theory. The extension to the centered asymptotically stationary case follows from the notion of asymptotically equivalent matrices, in the L 2 sense: see section 7.4 p. 104 of [8], and [7]. The noncentered stationary case (m(t) = m ∞ and K(s, t) = k(s − t)) has received much less attention. In Proposition 2.2 of [4], the Large Deviation Principle is obtained for a squared noncentered stationary Gaussian process. There, the centered case is deduced from Szegő's theorem, while the noncentered case follows from the contraction principle.
A different approach to the noncentered stationary case is proposed here. Instead of the spectral decomposition and Szegő's theorem, a Wiener-Hopf factorization is used. The limits (1.4) and (1.5) are both deduced from the asymptotics of that factorization. The technique is close to those developed in [11], that were used in [12]. One advantage is that the coefficients of the Wiener-Hopf factorization can be given a probabilistic interpretation in terms of a regression problem. This approach will be detailed in section 2.
To go from the stationary to the asymptotically stationary case, asymptotic equivalence of matrices is needed. But the classical L 2 definition of [7, section 2.3] does not suffice for the noncentered case. A stronger notion, linked to the L 1 norm of vectors instead of the L 2 norm, will be developed in section 3.
Joining the stationary case to asymptotic equivalence, one gets the conclusion of Theorem 1.1, but only for small enough values of α. To deduce that the convergence holds for all α 0, an extension of Lévy's continuity theorem will be used: if both (L t (α)) 1/t and e −ℓ(α) are Laplace transforms of probability distributions on R + , then the convergence over an interval implies weak convergence of measures, hence the convergence of Laplace transforms for all α 0. Actually, (L t (α)) 1/t and e −ℓ(α) both are Laplace transforms of infinitely divisible distributions, more precisely convolutions of Gamma distributions with Poisson compounds of exponentials. Details will be given in section 4, together with the particular case of a Gauss-Markov process.

The stationary case
This section treats the stationary case: m(t) = m ∞ and K(s, t) = k(t − s). We shall denote by c t = (m ∞ ) s=0,...,t−1 the constant vector with coordinates all equal to m ∞ , and by H t the Toeplitz matrix with symbol k: H t = (k(s − r)) s,r=0,...,t−1 . The main result of this section is a particular case of Theorem 1.1. It entails Proposition 2.2 of Bryc and Dembo [4]. and denote by f the corresponding spectral density: Let Z = (Z t ) t∈Z be a centered stationary process with covariance function k. Let m ∞ be a real. For all α such that 0 α < 1/(2M), where ℓ 0 (α) and ℓ 1 (α) are defined by (1.4) and (1.5).
Denote by m t and K t the mean and covariance matrix of the vector (X s ) s=0,...,t−1 . The Laplace transform of the squared norm of a Gaussian vector has a well known explicit expression: see for instance [16, p. 6]. The identity matrix indexed by 0, . . . , t−1 is denoted by I t , the transpose of a vector m is denoted by m * .
In the stationary case, m t = c t and K t = H t . From (2.7), we must prove that the following two limits hold.
lim t→+∞ 1 2t log(det(I t + 2αH t )) = ℓ 0 (α) = 1 4π 2π 0 log(1 + 2αf (λ)) dλ , (2.8) and lim Here, I t + 2αH t will be interpreted as the covariance matrix of the random vector (Y s ) s=0,...,t−1 , from the process where ε = (ε t ) t∈Z is a sequence of i.i.d. standard normal random variables, independent from Z. The limits (2.8) and (2.9) will be deduced from a Cholesky decomposition of I t + 2αH t . We begin with an arbitrary positive definite matrix A. The Cholesky decomposition writes it as the product of a lower triangular matrix by its transpose. Thus A −1 is the product of an upper triangular matrix by its transpose. Write it as and for any vector m = (m(r)), (2.13) Here is the probabilistic interpretation of the coefficients G(t, s). Consider a centered Gaussian vector Y with covariance matrix A. For t = 0, . . . , n, denote by Y 0,t the σ-algebra generated by Y 0 , . . . , Y t , and by ν t the partial innovation: with the convention ν 0 = Y 0 . Using elementary properties of Gaussian vectors, it is easy to check that: (2.14) Moreover, the ν t 's are independent, and the variance of ν t is 1/G(t, t). When this is applied to A = I t + 2αH t , another interesting interpretation arises. For t = 0, . . . , n (G(t, s)) s=0,...,t is the unique solution to the system: Observe that the equations (2.15) are the normal equations of the regression of the ε t 's over the Y t 's in the model (2.10).
This means that: Obviously, the µ t 's are independent, the variance of µ t is G(t, t) and the filtering error is: In particular, it follows that 0 < G(t, t) < 1.
The asymptotics of G(t, s) will now be related to the spectral density f . Denote g t (s) = G(t, t − s). A change of index in (2.15) shows that (g t (s)) s=0,...,t is the unique solution to the system: (2.17) and denote by f the corresponding spectral density: For all α such that 0 α < 1/(2M), the following equation has a unique solution in One has: The main idea of the proof amounts to writing the Wiener-Hopf factorization of the operator I + 2αH. The method is originally due to M. G. Krein: see section 1.5 of [3], in particular the proof of Theorem 1.14 p. 17.
Proof. Conditions of invertibility for Toeplitz operators are well known. They are treated in section 2.3 and 7.2 of [3]. Here, the L 1 norm of the Toeplitz operator H with symbol k is M, and the condition 0 α < 1/(2M) permits to write the inverse as: That the truncated inverse (I t + 2αH t ) −1 converges to (I + 2αH) −1 is deduced for the L 2 case from [3, p. 42]. Convergence of entries follows, hence (2.21). To obtain (2.22), consider δ t (s) = g(s) − g t (s). From (2.17) and (2.18): Thus the following bound is obtained: The generating function of (g(s)) s 0 will now be related to the spectral density f . Define for all s ∈ Z, Denote by F + and F − the Fourier transforms of g + and g − : Take Fourier transforms in both members of (2.18): Or else: The functions F ± (λ) can be seen as being defined on the unit circle. They can be extended into analytic functions, one inside the unit disk, the other one outside. For |ζ| 1:F and for |ζ| 1: Similarly, we shall denote byf (ζ) the value of f (λ) for ζ = e iλ . Let ζ 0 be any fixed complex inside the unit disk. In (2.23), take logarithms of both members (principal branch), divide by 2πi(ζ − ζ 0 ), then take the contour integral over the unit circle.
SinceF + is analytic inside the unit disk, the residue of the first integral isF + (ζ 0 ). Let us prove that the integral in the second member is null. Since the function to be integrated is analytic outside the unit circle, the integral has the same value over any circle with radius ρ > 1, centered at 0.
As ρ tends to +∞, it can easily be checked that the right hand side tends to zero. Thus (2.24) becomes: Two particular cases are of interest. Consider first ζ 0 = 0. SinceF The other particular case is ζ 0 = 1, but it is on the unit circle; so a limit has to be taken.
Sincef has no singularity at 1, the limit of the second integral is: Written as a real integral: For all λ, the real part of 1/(1 − e −iλ ) is 1/2. The integral of the imaginary part vanishes, because the function to be integrated is odd. Finally: hence (2.20), sinceF Here is the probabilistic interpretation. Consider a centered stationary process (Y t ) t∈Z , with covariance function A(t, s) = a(t − s). For s t, denote by Y s,t the σ-algebra generated by (Y r ) r=s,...,t . Consider again the partial (2.14), and using stationarity, ν t has the same distribution as which is: As t tends to infinity, η t converges almost surely to: Observe by stationarity that for all r, which is the innovation process associated to Y . Now the variance of ν t , 1/G(t, t) tends to the variance of η ∞ . From the Szegő-Kolmogorov formula (see e.g. Theorem 3 p. 137 of [9]), that variance is: where φ(λ) is the spectral density of Y . Let X be a centered stationary process with covariance function k, ε be a standard Gaussian noise, and Y = ε + √ 2αX. The spectral densities φ of Y and f of X are related by φ(λ) = 1 + 2αf (λ). Hence: which is equivalent to (2.19). Alternatively, observe that due to stationarity, µ t defined by (2.16) has the same distribution as: As t tends to infinity, ξ t converges a.s. to: Of course, since E[ε −s Y −r ] = δ s,r , for all s = 0, . . . , t: Hence the limiting property (2.21) says that: Actually, ξ ∞ admits the representation: Similarly, for all t, which means that (g(s)) realizes the optimal causal Wiener filter of ε t from the Y t−s 's.
Applying now (2.13) to A = I t + 2αH t , one gets: From Proposition 2.2:

Asymptotic equivalence
and lim For symmetric matrices, the norm subordinate to · ∞ is equal to the norm subordinate to · 1 . It will be denoted by · and referred to as strong norm. For A = (A(s, r)) s,r=0,...,t−1 such that A * = A, The following weak norm will be denoted by |A|: |A(s, r)| .
Clearly, |A| A . Moreover, the following bounds hold. Here is a definition of asymptotic equivalence for vectors.

Definition 3.2.
Let (v t ) t 0 and (w t ) t 0 be two sequences of vectors such that for all t 0, v t = (v t (s)) s=0,...,t−1 and w t = (w t (s)) s=0,...,t−1 . They are said to be asymptotically equivalent if: Asymptotic equivalence of (v t ) and (w t ) will be denoted by v t ∼ w t .
Hypotheses (H1) and (H4) imply that m t ∼ c t . Asymptotic equivalence for matrices is defined as follows (compare with [7, p. 172]). Asymptotic equivalence of (A t ) and (B t ) will still be denoted by A t ∼ B t .
Here are some elementary results, analogous to those stated in Theorem 1 p. 172 of [7].

If A t ∼ B t and F is an analytic function with radius
Proof. Points 1 and 2 follow from the triangle inequality for the weak norm. For point 3, because · is a norm of matrices, A t C t A t C t and B t D t B t D t are uniformly bounded. Moreover by Lemma 3.1, Since C t and B t are uniformly bounded, and the result follows. For point 4, let F be analytic with radius of convergence R. For |z| < R, let The matrices F (A t ), F (B t ) are defined as the limits of F n (A t ), F n (B t ); from the hypothesis, it follows that the convergence is uniform in t. Because · is a matrix norm, F (A t ) F ( A t ) and the same holds for B t : F (A t ) and F (B t ) are uniformly bounded. Let ǫ be a positive real. Fix n such that for all t, By induction on n using points 2 and 3, F n (A t ) ∼ F n (B t ). There exists t 0 such that for all t > t 0 , Thus for all t > t 0 , Hence the result.

If v t ∼ w t and A t is uniformly bounded, then
A t w t ∞ are uniformly bounded comes from the fact that · is subordinate to · ∞ . Next for point 1: For point 2: The relation between asymptotic equivalence of vectors and our goal is the following.

Lemma 3.6.
If v t ∼ w t and u t ∼ z t , then lim t→+∞ Hence the result.
Using asymptotic equivalence, (3.26) and (3.27) can easily be deduced from (2.8) and (2.9), for 0 < α < 1/(2M). We shall not detail the passage from (2.8) to (3.26): see Theorem 4 p. 178 of [7]. Here is the passage from (2.8) to (3.26). For all α < 1/(2M), it follows from (3.28) by point 1 of Lemma 3.5 that By point 2 of Lemma 3.5: Lemma 3.6 implies: lim t→+∞ Still using asymptotic equivalence, it will now be shown that Proposition 1.2 is just a particular case of Theorem 1.1. Indeed, consider the Gaussian process X x with mean , (3.29) and covariance function The distribution of (X x t ) t∈N and the conditional distribution of (X t ) t∈N given X 0 = x are the same. Denote by m x,t and K • t the mean and covariance matrix of (X x s ) s=0,...,t−1 . Theorem 1.1 applies to X x , provided it is proved that m x,t ∼ c t and K • t ∼ H t . By (H1) and (H2), m x,t ∞ is uniformly bounded. Moreover from (3.29), thus m x,t ∼ m t , hence m x,t ∼ c t by transitivity. Now from (3.30), Moreover, . This section will end with another illustration of asymptotic equivalence, which is of independent interest and yields an alternative proof of (2.9). (3.31) The function (e −iλs ) s∈Z is an eigenfunction of the Toeplitz operator H with symbol k, associated to the eigenvalue f (λ). Thus Proposition 3.7 is closely related to Szegő's theorem: compare with Theorem 5.9 p. 137 of [3]. Notice that c t = m ∞ d (0) t : in the particular case λ = 0 and F (z) = (1 + 2αz) −1 , one gets: from which (2.9) follows, through Lemma 3.6. If instead of being constant, the asymtotic mean is periodic, Proposition 3.7 still gives an explicit expression of ℓ 1 (α). As an example, assume m(t) = (−1) t m ∞ . Then (2.9) holds with: Proof. We first prove (3.31) for F (z) = z. Using the fact that · is subordinate to Observe, due to the symmetry of k, that δ − (−s) = δ + (s) Thus: The sequence (|δ + (s)|) s∈N tends to 0, as a consequence of the summability of k (H3). Therefore it also tends to zero in the Cesàro sense. Hence the result.
By induction, using the triangle inequality, (3.31) holds for any polynomial F n . The rest of the proof is the same as that of point 4 in Lemma 3.4.

Asymptotic distributions
The results of the two previous sections establish that the conclusion of Theorem 1.1 holds for a small enough α. To finish the proof, the convergence must be extended to all α 0. The following variant of Lévy's continuity theorem applies (see Chapter 4 of [10], in particular Exercise 9 p. 78). Then (π n ) converges weakly to π and the convergence holds for all α 0.
To apply this lemma, one has to check that (L t (α)) 1/t and e −ℓ(α) are the Laplace transforms of probability distributions on R + . It turns out that in our case, the function L t (α) defined by (1.2) is the Laplace transform of an infinitely divisible distribution, thus so are (L t (α)) 1/t and its limit. We give here the probabilistic interpretation of e −ℓ 0 (α) and e −ℓ 1 (α) as the Laplace transforms of two infinitely divisible distributions. Next, the particular case of a Gauss-Markov process will be considered.
Through an orthogonal transformation diagonalizing its covariance matrix, the squared norm of any Gaussian vector can be written as the sum of independent random variables, each being the square of a Gaussian variable, thus having noncentral chi-squared distribution. If Z is Gaussian with mean µ and variance v, the Laplace transform of Z 2 is: The first factor is the Laplace transform of the Gamma distribution with shape parameter 1/2 and scale parameter 2v. Assuming µ and v non null, rewrite the second factor as: This is the Laplace transform of a Poisson compound, of the exponential with expectation 2v, by the Poisson distribution with rate µ 2 2v . Therefore, the squared norm of a Gaussian vector has an infinitely divisible distribution, which is a convolution of Gamma distributions with Poisson compounds of exponentials. Squared Gaussian vectors have received a lot of attention, since even in dimension 2, the mean and covariance matrix must satisfy certain conditions for the distribution of the vector to be infinitely divisible [15]. Yet the sum of coordinates of such a vector always has an infinitely divisible distribution.
For all t, the distribution with Laplace transform (L t (α)) 1/t is the convolution of Gamma distributions with Poisson compounds of exponentials. As t tends to infinity, (L t (α)) 1/t tends to e −ℓ 0 (α) e −ℓ 1 (α) . The first factor e −ℓ 0 (α) is the Laplace transform of a limit of convolutions of Gamma distributions, which belongs to the Thorin class T (R + ) (see [2] as a general reference). Consider now e −ℓ 1 (α) . Rewrite ℓ 1 (α) as: Thus e −ℓ 1 (α) is the Laplace transform of a Poisson compound, of the exponential distribution with expectation 2f (0), by the Poisson distribution with parameter m 2 ∞ 2f (0) . As an illustrating example, consider the Gauss-Markov process defined as follows. Let θ be a real such that −1 < θ < 1. Let (ε t ) t 1 be a sequence of i.i.d. standard Gaussian random variables. Let Y 0 , independent from the sequence (ε t ) t 1 , follow the normal N (0, (1 − θ 2 ) −1 ) distribution. For all t 1 let: Thus (Y t ) t∈N is a stationary centered auto-regressive process. Consider the noncentered process (X t ) t∈N , with X t = Y t +m ∞ . This is the case considered in [12], where a stronger result was proved. Formula (10) p. 72 of that reference matches (1.4) and (1.5) here. Indeed, the spectral density is: .
Write ℓ 0 (α) as a contour integral over the unit circle.