Modeling temporally uncorrelated components for complex-valued stationary processes

We consider a complex-valued linear mixture model, under discrete weakly stationary processes. We recover latent components of interest, which have undergone a linear mixing. We study asymptotic properties of a classical unmixing estimator, that is based on simultaneous diagonalization of the covariance matrix and an autocovariance matrix with lag $\tau$. Our main contribution is that our asymptotic results can be applied to a large class of processes. In related literature, the processes are typically assumed to have weak correlations. We extend this class and consider the unmixing estimator under stronger dependency structures. In particular, we analyze the asymptotic behavior of the unmixing estimator under both, long- and short-range dependent complex-valued processes. Consequently, our theory covers unmixing estimators that converge slower than the usual $\sqrt{T}$ and unmixing estimators that produce non-Gaussian asymptotic distributions. The presented methodology is a powerful prepossessing tool and highly applicable in several fields of statistics. Complex-valued processes are frequently encountered in, for example, biomedical applications and signal processing. In addition, our approach can be applied to model real-valued problems that involve temporally uncorrelated pairs. These are encountered in, for example, applications in finance.


Introduction
In modern statistics, there is an increasing demand for theoretically solid methodology, which can be applied beyond standard real-valued data, into the realm of more exotic data structures. In this paper, we consider a complex-valued linear mixture model, based on temporally uncorrelated components, in the context of discrete weakly stationary processes. We aim to find latent processes of interest, when only linear mixtures of them are observable. The recovery of the latent processes of interest is referred to as the unmixing procedure. The properties of the recovered processes themselves can be the source of interest, or alternatively, the approach can be used to reduce multivariate models into several univariate models, which can simplify the modeling burden.
In the context of linear mixture models, the assumption of uncorrelated components, or stronger conditions that imply uncorrelated components, is considered to be natural in several applications, for example, in finance and signal processing, see the article Fan et al. (2008) and the books Alexander (2001); Comon and Jutten (2010). Applications, where the observed time series are naturally complex-valued are frequent in, e.g., signal processing and biomedical applications, see Zarzoso and Comon (2009) ;Li et al. (2011). In such applications, the interest can lie in the shapes of the unobservable signals, such as, the shapes of three dimensional image valued signals, which correspond to different parts of the brain under different activities. In the signal processing literature, the problem is often referred to as the blind source separation (BSS) problem. In the BSS literature, it has been argued that discarding the special complex-valued structure can lead to lost information, see Adali and Schreier (2014). Thus, it is often beneficial to directly work with complex-valued signals in such applications. We wish to emphasize that our C d -valued model does not directly correspond to existing R 2d valued models, e.g., the one in Miettinen et al. (2012). For a collection of BSS applications, see Comon and Jutten (2010).
In parallel to the signal processing community, linear mixture models, with similar model assumptions as in BSS literature, have recently received notable attention in finance, see for example Fan et al. (2008); Matteson and Tsay (2011). In financial applications, the term blind source separation is rarely used, and usually more descriptive model naming conventions are utilized. Note that, our complex-valued approach is highly applicable in many real-valued financial applications. In our approach, we assume little concerning the relationship between the real-and imaginary-parts of a single component. Thus, from the real-valued perspective, the problem can be equivalently considered as modeling real-valued temporally uncorrelated pairs, where the elements contained in a single pair are not necessarily uncorrelated. It turns out, that the algebra is often nicer in the complex-valued case, when compared to the notation required by the real-valued counterpart.
Once the temporally uncorrelated components are recovered, one can then, for example, model volatilities bivariately (or univariately) or asses risk by modeling tail behavior with the tools of bivariate (or univariate) extreme value theory. Moreover, our approach is natural in applications, where a single observation is vector valued, that is, the observations have both a magnitude and a direction, e.g., modeling the latent components of wind at an airport.
The main focus in this paper is on asymptotic behavior of a classical unmixing estimator. We consider an algorithm that is identical to the so-called Algorithm for Multiple Unknown Signals Extraction (AMUSE), Tong et al. (1990). In the financial side, asymptotic properties of an alternative approach, with slightly differing model assumptions compared to our approach, are given in, e.g., Fan et al. (2008). Asymptotics, in the context of real-valued BSS, have been considered in, e.g., Miettinen et al. (2012Miettinen et al. ( , 2016Miettinen et al. ( , 2019. Compared to the above mentioned approaches, we consider a substantially wider class of processes. In particular, we analyze the asymptotic behavior of the corresponding estimators under both, long-and short-range dependent complex-valued processes. Furthermore, we take a semi-parametric approach, that is, our theory is not limited to specific parametric family of distributions. As a pinnacle of the novel theory presented in this paper, we consider processes without summable autocovariance structures, which causes the limiting distribution of the corresponding unmixing estimator to be non-Gaussian. Instead of using the classical central limit theorem, we take a modern approach and utilize general central limit theorem type results that are also applicable for processes with strong temporal dependencies. Moreover, we consider convergence rates that differ from the usual √ T . We wish to emphasize that modeling long-range dependent processes, that is, processes without summable autocovariance structures, are of paramount importance in many financial applications.
The paper is structured as follows. In Section 2, we recall some basic theory regarding complex-valued random variables and we introduce the notation that we follow in this paper. In Section 3, we present the temporally uncorrelated components mixing model and solutions to the corresponding unmixing problem at the population level. In Section 4, we formulate the estimation procedure and study the asymptotic properties of the estimators. In Section 5, we a real-data example involving photographs, which can be presented as complex-valued time series. All the proofs of this paper are presented in Appendix A.

Random variables in the complex plane
In this section, we review some basic definitions and classical estimators for complex-valued random variables. Additionally, we recall the definition of the complex multivariate Gaussian distribution and the classical complex-valued central limit theorem.
Throughout, let (Ω, F , P) be a common probability space and let z • := (z t ) t∈N be a collection of random variables z t : where pr k is a projection to the kth complex coordinate. We refer to the process z (k) t ) t∈N as the kth component of the C d -valued stochastic process z • . Note that the components can be expressed in the form z t are real-valued ∀k ∈ {1, 2, . . . , d} and i is the imaginary unit. We denote the complex conjugate of z t as z * t and we denote the conjugate transpose of z t as z H t := (z * t ) . Furthermore, we use B(C d ) and B(R 2d ) to denote the Borel-sigma algebras on C d and R 2d , respectively. We use the following notation for the multivariate mean and the unsymmetrized autocovariance matrix with lag τ ∈ {0} ∪ N, Furthermore, we use the following notation for the symmetrized autocovariance matrix with lag τ ∈ {0} ∪ N, In the case of univariate stochastic processes x • and y • , we useS τ [x t , y s ] and R τ [x t , y s ] to denote, Note that the unsymmetrized autocovariance matrixS τ is not necessarily conjugate symmetric, i.e., Hermite symmetric or Hermitian, and it can have complex-valued diagonal entries. In contrast, the symmetrized autocovariance matrix S τ is always conjugate symmetric, that is, S τ = S H τ , and the diagonal entries of the symmetrized autocovariance matrix are always real-valued.
In the literature, the definition of stationarity for complex-valued stochastic processes varies. In this paper, we use the following definitions for strict stationarity and weak stationarity.
Definition 2.1. The C d -valued process z • := (z t ) t∈N is strictly stationary if for any τ ∈ N and for any finite subset S ⊂ N, {z t+τ : t ∈ S} has the same distribution as {z t : t ∈ S} .
Similarly as in the real-valued case, the distribution function of the C d -valued random vector z t is the probability measure P zt on (C d , B(C d )) defined as, and if for any τ ∈ {0} ∪ N and for any pair (u, v) We next consider finite sample estimators for location and autocovariance. Let Z := (Z j ) j∈T , T = {1, 2, . . . , T }, be a sampled process generated by z • , where Z j is C d -valued for every j ∈ T . We use the following classical finite sample estimators for the mean vector and the autocovariance matrix with lag τ ∈ {0, 1, . . . , T − 1}, where for τ = 0, the scaling 1/(T − 1) is used forS τ . Furthermore, the symmetrized autocovariance matrix estimator with lag τ ∈ {0, 1, . . . , T − 1} is defined as,Ŝ The symmetrized autocovariance matrix estimator is always conjugate symmetric. Note that the estimation of the eigenvectors of a conjugate symmetric matrix is more stable when compared to the estimation of the eigenvectors of a non-conjugate-symmetric matrix. Let y 1 and y 2 be R d -valued random vectors such that the concatenated random vector, that is, y 1 y 2 , follows a R 2d -valued Gaussian distribution. Then, the C d -valued random vector y = y 1 + iy 2 follows the C d -valued Gaussian distribution with the location parameter µ y , the covariance matrix Σ y and the relation matrix P y , defined as follows, We distinguish between complex-and real-valued Gaussian distributions with the number of given parameters -N d (µ y , Σ) denotes a d-variate realvalued Gaussian distribution and N d (µ y , Σ, P) denotes a d-variate complexvalued Gaussian distribution.
The classical central limit theorem (CLT) for independent and identically distributed complex-valued random variables is given as follows. Let {x 1 , x 2 , . . . , x n } be a collection C d -variate i.i.d. random vectors with square integrable components, mean µ x , covariance matrix Σ and relation matrix P. Then, where D − → denotes convergence in distribution. We can harness the rich literature on real-valued limiting theorems by utilizing the following lemma.
Lemma 2.3 can be, for example, applied to Gaussian vectors in the following manner.
Corollary 2.4. Let {v n } n∈N be a collection of R 2d -valued random vectors v j = x j y j , where x j , y j are R d -valued for every j ∈ N. Let, where α n ↑ ∞ as n → ∞ and, Then, for the sequence of C d -valued random vectors z j = x j + iy j , we have that, Using the notation of Corollary 2.4, we can straightforwardly demonstrate why we need two parameter matrices Σ z , P z and a single parameter vector µ z in order to define a complex-valued Gaussian distribution. The vector µ v contains 2d free parameters of location and the block covariance matrix Σ v contains 2d 2 + d free parameters of variances and covariances. We require that the information, regarding the location and the covariance structure, of the random vectors v and z is equal. Note that neither the covariance matrix Σ z nor the relation matrix P z contains enough information to uniquely construct the covariance matrix Σ v . For example, by considering the realand imaginary parts of Σ z , we get two matrix equations, which is not enough to define Σ x , Σ y and Σ xy uniquely. By defining both Σ z and P z , we get four matrix equations, which is enough to fix the free parameters.
In our work, we utilize Lemma 2.3 and Corollary 2.4 in order to apply realvalued results for complex-valued scenarios, but we wish to emphasize that they also work the other way -the complex-valued asymptotic distribution automatically grants the corresponding real-valued limiting distribution. For additional details regarding complex-valued statistics, see e.g., Eriksson et al. (2010); Goodman (1963); Picinbono (1996).

Temporally uncorrelated components model
In this section, we consider a linear temporally uncorrelated components model for discrete time complex-valued processes. We formulate the so-called unmixing problem and give functionals that solve the corresponding problem. The temporally uncorrelated components mixing model is defined as follows.
Definition 3.1. The C d -process x • := (x t ) t∈N follows the temporally uncorrelated components mixing model, if where A is a constant nonsingular C d×d -matrix, µ x ∈ C d is a constant location parameter and z • is a non-degenerate C d -process with components having continuous marginal distributions. In addition, the process z • satisfies the following four conditions, To improve the fluency of the paper, from hereon, the term mixing model is used to refer to the temporally uncorrelated components mixing model. The conditions (1)-(4) of Definition 3.1 imply that the latent process z • is weakly stationary in the sense of Definition 2.2. Hereby, in the population level, it suffices to consider the covariances and autocovariances for some generic t ∈ N. Note that, under these model assumptions, the concatenated R 2d -process is not necessarily weakly stationary in the classical real-valued sense. One could also require that the concatenated vector is stationary, which can be done by adding a condition involving the complexvalued relation matrix, defined in Section 2, to the model. However, adding the extra condition would also fix the relationship, up to heterogeneous signchanges, between the real-and imaginary parts inside a single component. In this paper, we do not make wish to make assumptions regarding the complex phase of a single component. Consequently, many of the following results involve a so called phase-shift matrix J, that is, a complex-valued diagonal matrix with diagonal entries of the form exp(iθ 1 ), . . . , exp(iθ d ). Note that a phase-shift matrix J is by definition unitary, i.e., JJ H = J H J = I d .
In our approach, the existence of a single diagonal Λ τ suffices, thus making the model more general when compared to real-valued approaches where Λ τ is required to be diagonal for all τ ∈ N, see e.g., Miettinen et al. (2012Miettinen et al. ( , 2016.
For a process x • that follows the mixing model, the corresponding unmixing problem is to uncover the latent process z • , by using only the information contained in the observable process x • . Next, we define affine functionals that we refer to as solutions to the unmixing problem.
Definition 3.2. Let x • := (x t ) t∈N be process that satisfies Definition 3.1, such that condition (4) holds for some fixed τ . The affine functional g : a → Γ(a − µ) : C d → C d is a solution to the corresponding unmixing problem if the process g • x • satisfies conditions (1) -(3) and condition (4) is satisfied with the fixed τ .
Given a process x • that follows the mixing model and a corresponding solution g : a → Γ(a − µ), we refer to A, see Definition 3.1, as the mixing matrix and we refer to Γ as the unmixing matrix. The objective of the unmixing matrix is to reverse the effects of the mixing matrix. However, under our model assumptions, we cannot uniquely determine the relationship between the mixing and the unmixing matrix. The relationship between the two matrices is considered in the following lemma. where J ∈ C d×d is a phase-shift matrix.
The real-valued version of Lemma 3.3 is achieved by considering phaseshift matrices with diagonal elements exp(iθ 1 ), . . . , exp(iθ d ), where θ j are multiples of π. Consequently, in the real-valued case, the matrix J is a so-called sign-change matrix. Note that all possible phase-shift matrices can be constructed by considering phases on some 2π-length interval, e.g., the usual [0, 2π).
By Lemma 3.3, we cannot distinguish between solutions that contain unmixing matrices that are a phase-shift away from each other. Thus, we say that two solutions g 1 : a → Γ 1 (a − µ 1 ) and g 2 : a → Γ 2 (a − µ 2 ) are equivalent, if there exists a phase-shift matrix J such that Γ 1 = Γ 2 J. On the contrary, if g 1 and g 2 are both solutions to the same unmixing problem, it follows that µ 1 = µ 2 .
We next provide a solution procedure for the unmixing problem. Recall that in the mixing model, we assume that the mixing matrix A and the latent process z • are unobservable. Therefore, the solution procedure relies solely on statistics of the observable process x • .
(3.1) and the following scaling-equation, are satisfied.
One can apply Theorem 3.4 in order to find solutions by first estimating the eigenvalues and -vectors of (S 0 [x t ]) −1 S τ [x t ] and then scaling the eigenvectors such that the scaling equation is satisfied. In addition, the eigenvectors should be ordered such that the corresponding eigenvalues, i.e., the diagonal elements of Λ τ are in decreasing order. Such solutions can be generated straightforwardly with the following corollary.
Corollary 3.5. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and is a solution to the corresponding unmixing problem.
The solutions for the unmixing problem are affine invariant in the following sense.
Lemma 3.6. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and let g : a → Γ(a − µ) be a corresponding solution. Furthermore, let for some phase-shift matrix J.

Estimation and asymptotic properties
The algorithm for multiple unknown signals extraction (AMUSE), Tong et al. (1990), is a widely applied blind source separation unmixing procedure. The AMUSE procedure can be applied to solve the unmixing problem presented in this paper. AMUSE and the corresponding asymptotic properties have been previously studied in the real-valued case, see e.g, Miettinen et al. (2012). Here, we adopt the estimation part of the AMUSE algorithm. However, our underlying model assumptions are not identical to the ones in Miettinen et al. (2012) and Tong et al. (1990). Thus, in order to avoid misinterpretations, we refrain from using the term AMUSE and we simply use the term unmixing procedure. In this section, we consider the consistency and the limiting distribution of the corresponding unmixing estimator under complex-valued long-range and short-range dependent processes, significantly extending the results given in Miettinen et al. (2012).
Analysis of the temporally uncorrelated components relies on estimating the mean vector µ and the unmixing matrix Γ. Our main focus is on the asymptotic theory of the unmixing matrix estimator, since under the mixture model of this paper, the estimation of the location parameter is straightforward. The following definition presents finite sample solutions for the unmixing problem.
Definition 4.1. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and let X be a C T ×d -valued, 1 ≤ d < T < ∞, sampled stochastic process generated by x • . Letμ :=μ[X] be the sample mean and let 1 T be a R T -vector full of ones. Then, the mappingĝ : Recall, thatŜ τ is the symmetrized autocovariance matrix, and it is, by definition, conjugate symmetric. In addition, recall that the diagonal entries and the eigenvalues of a conjugate symmetric matrix are, again by definition, real-valued. Hereby, in Definition 4.1, the ordering ofλ Let X j denote the transpose of the jth row of X, that is, the value the sampled C d -process takes at time j. Given a finite sample solution g : C → (C − 1 Tμ )Γ , we get an estimate for the latent process z • from the mappingĝ(X). Note that, under the mapping, we get that X j is mapped as follows, Hereby, Equation (4.1) corresponds to the population version given in Definition 3.2.
As in the population case, we can solve the finite sample unmixing problem by utilizing the following lemma.
Lemma 4.2. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and let X be a C T ×d -valued, 1 ≤ d < T < ∞, sampled stochastic process generated by where V is unitary. Such matricesΣ 0 , V exists, and the mappinĝ is a solution to the finite sample unmixing problem.
Using Lemma 4.2, we can straightforwardly implement the unmixing procedure. The first step is to calculateŜ 0 :=Ŝ 0 [X], from a realization X, and the corresponding conjugate symmetric inverse square rootΣ 0 :=Ŝ Recall that, by assumption, the components of z • have continuous marginal distributions and the mixing matrix is nonsingular. Thus, the eigenvalues of covariance matrix estimates are always real-valued and almost surely positive. Hereby, the matrixΣ 0 can be obtained conveniently by estimating the eigendecomposition ofŜ 0 . The next step is to choose a lag-parameter τ and estimate the eigendecompositionŜ The covariance matrix and autocovariance matrix estimators are affine equivariant in the sense thatŜ j An estimate for the latent process can then be found via the mapping (X − 1 Tμ )(V HΣ 0 ) , whereμ is the sample mean vector of X.
In practice, one should choose the lag parameter τ = 0 such that the diagonal elements ofΛ τ are as distinct as possible. Strategies for choosing τ are discussed, for example, in Cichocki and Amari (2002). From a computational perspective, the estimation procedure is relatively fast. Hereby, in practice, one should try several different values of τ and study the diagonal elements ofΛ τ . It is usually beneficial to start with lag parameters close to τ = 1, as in several applications the autocovariances tend to diminish as τ grows. In addition, values for τ that are close to T should be avoided, since one might end up estimating the autocovariance matrix using only a small number of observations.
We emphasize that one can apply the theory of this paper to other estimators as well. That is, under minor model assumptions, the estimatorŝ S 0 ,Ŝ τ can be replaced with any matrix valued estimators that have the so-called complex-valued affine equivariance property, see Ilmonen (2013).
The conditions given in Definition 4.1 remain true, if we replaceΓ with JΓ, where J ∈ C d×d can be any phase-shift matrix. In other words, ifΓ is an unmixing matrix estimate, then JΓ is an equally viable estimate. Thus, as in the population level, we say that the estimatesΓ 1 andΓ 2 are equivalent ifΓ 1 = JΓ 2 for some phase-shift matrix J. Furthermore, similarly as in Lemma 3.6, we have that the unmixing matrix estimates are affine invariant in the following sense.
Lemma 4.3. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and let X be a C T ×d -valued, 1 ≤ d < T < ∞, sampled stochastic process generated by x • and letĝ : C → (C − 1 Tμ )Γ be a corresponding finite sample solution defined in Lemma 4.2. Let B ∈ C d×d be nonsingular, let b ∈ C d , let X = XB + 1 T b and letg : C → (C − 1 Tμ )Γ be a corresponding finite sample solution for X . Then,Γ = JΓB −1 is a finite sample solution for X , where J is some phase-shift matrix.
Justified by the affine invariance property given in Lemmas 3.6 and 4.3, we can, without loss of generality, derive the rest of the theory under the assumption of trivial mixing, that is, the case when the mixing matrix is A = I d . See also the beginning of the Proof of Lemma 4.4.
We next consider limiting properties of the finite sample solutions. Note that, the finite sample statistics and the sampled stochastic process X all depend on the sample size T . Furthermore, we mainly follow the multivariate probabilistic Bachmann-Landau notation presented in Van der Vaart (2000). We use the short expression X T = o p (1) to denote that a sequence of C d×dmatrices X 1 , X 2 . . . , converges in probability to a zero matrix as T → ∞. In addition, we use Y β = O p (1) to denote that a collection of C d×d -matrices {Y β : β ∈ B} is uniformly tight, i.e., bounded in probability under some non-empty indexing set B. By saying that a matrix is uniformly tight, we mean that every element of the matrix is uniformly tight. Note that in the framework of this paper, our usage of the notation is equivalent with the one presented in Van der Vaart (2000).
Lemma 4.4. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 and let X be a C T ×d -valued, 1 ≤ d < T < ∞, sampled stochastic process generated by x • and letĝ : Then, there exists a sequence of T -indexed phase-shift matrices Ĵ, such that the following holds asymptotically, Under the assumptions of Lemma 4.4, we have that the sequence of unmixing matrix estimators is consistent in the sense that there exists a sequence of T -indexed phase-shift matricesĴ such thatĴΓ converges in probability to Γ, as T → ∞. This convergence in probability follows directly from Lemma A.1.
By Theorem 4.5, we can directly find the asymptotic distribution of the unmixing matrix estimatorΓ, if we have the asymptotic distributions, and the convergence rates, of the estimatorsŜ 0 andŜ τ . Note that, if the rates α T and β T , given in Lemma 4.4, differ, the estimator with the faster convergence will tend to zero in probability in Theorem 4.5. Furthermore, note that the asymptotic distributions of the diagonal elements of γ T (ĴΓ − I d ) only depend on the asymptotic distribution of the covariance matrix estimatorŜ 0 . However, the rate of convergence of the off-diagonal elements depends on botĥ S 0 andŜ τ , and consequently, if the covariance estimatorŜ 0 converges faster than the autocovariance estimatorŜ τ , the diagonal elements of γ T (ĴΓ − I d ) converge to zero in distribution and in probability. Conversely, it may happen that the autocovariance estimatorŜ τ converges faster. In that case, the off-diagonal elements of γ T (ĴΓ − I d ) converge to zero in distribution and in probability. Examples are given in Subsection 4.2.
We again want to highlight that, aside from some minor model assumptions, affine equivariance is the only property that we require from the matrix valued estimators, and the estimators can be directly replaced with some other affine equivariant estimators. The estimation procedure could, for example, be robustified and Theorem 4.5 would directly give the asymptotic behavior of the robustified alternative, given that the asymptotic behavior of the robust estimators is known.
We can present Theorem 4.5 in matrix form as follows, τ ) −1 , j = k, denotes the Hadamard, i.e., the entrywise product andĴ is the T -indexed sequence of phase-shift matrices that set the diagonal elements ofΓ to be on the positive real-axis.
Theorem 4.5 can be applied under any nonsingular mixing. Let g andĝ be a solution and a finite sample solution, respectively, under the nonsingular mixing matrix A and denote the corresponding unmixing matrix as Γ and denote the corresponding sample unmixing matrix estimator asΓ. Likewise, let g 0 andĝ 0 be a solution and a finite sample solution, respectively, under the trivial mixing matrix and denote the corresponding sample unmixing matrix estimate asΓ 0 . The following relation can then be applied to generalize results under the trivial mixing to any nonsingular mixing scenario, where A − is the inverse of the transpose, ⊗ denotes the Kronecker product, J is a T -indexed sequence of phase-shift matrices that set the phases of the diagonal elements ofΓ and Γ to be equal andĴ 0 is a T -indexed sequence of phase-shift matrices that set the phases of the diagonal elements ofΓ 0 to be on the positive real-axis.
In the following subsections, we present examples of convergence of the autocovariance and covariance matrix estimators under a broad class of stochastic processes, notably, including long-range dependent processes.

Limiting distributions under summable covariance structures
In this section, we consider a class of stochastic processes that satisfy the Breuer-Major Theorem for weakly stationary processes, which we from hereon refer to as the Breuer-Major Theorem. The Breuer-Major Theorem is considered in the context of Gaussian subordinated processes, see e.g. Breuer and Major (1983); Arcones (1994); Nourdin et al. (2011). A univariate real-valued Gaussian subordinated process z • , defined on a probability space (Ω, F , P), is a weakly stationary process that can be expressed in the form Gaussian process and f : R → R is a measurable function. We emphasize that Gaussian subordinated processes form a very rich model class. For example, recently in Viitasaari and Ilmonen (2020), the authors showed that arbitrary one dimensional marginal distributions and a rich class of covariance structures can be modeled by f • y • with simple univariate stationary Gaussian process y • . While the model class is very rich, underlying driving Gaussian processes still provide a large toolbox for analyzing limiting behavior of various estimators. Such limit theorems have been a topic of active research for decades. For recent relevant papers on the topic, we refer to Nourdin et al. (2011); Nourdin and Nualart (2019); Nourdin and Peccati (2009, 2012 and the references therein. We next give the definitions of Hermite polynomials and Hermite ranks. We define the kth, k ∈ {0} ∪ N, (probabilistic) Hermite polynomial H k , using Rodrigues' formula, ∪ N} forms an orthonormal basis on the Hilbert-space L 2 (R, P y ), where P y denotes the law of a univariate standard Gaussian random variable. Consequently, every function f ∈ L 2 (R, P y ) can be decomposed as, where a k ∈ R for every k ∈ {0} ∪ N. If x and y follow the univariate standard Gaussian distribution, the orthogonality of the Hermite polynomials and the decomposition of Equation (4.2) give, The Hermite rank for a function f is defined as follows.
Definition 4.6 (Hermite rank). Let y be a R -valued Gaussian random vector, ∈ N, and let f : R → R be a function such that f • y is squareintegrable. The function f has Hermite rank q, with respect to y, if for all polynomials p m : R d → R that are of degree m ≤ q − 1 and there exists a polynomial p q of degree q such that Note that, in one dimensional setting, the Hermite rank q of a function f is the smallest non-negative integer in Equation (4.2), such that α q = 0. We next present the multivariate version of the well-known Breuer-Major Theorem from Breuer and Major (1983).
Theorem 4.7. Let y • := (y t ) t∈N be a R -valued centered stationary Gaussian process. Let f 1 , f 2 , . . . , f d : R → R be measurable functions that have at least Hermite rank q ∈ N, with respect to y • , and let f j • y • be square-integrable for every j ∈ {1, . . . d}. Additionally, let the series of covariances satisfy converges in distribution, as T → ∞, to a centered Gaussian vector ρ = ρ 1 · · · ρ d with finite covariances E [ρ j ρ k ] equal to, The proof of Theorem 4.7 is omitted here. Theorem 4.7 follows directly from the univariate Breuer-Major Theorem, given in Breuer and Major (1983), and by using the usual Cramér-Wold device, see e.g. Billingsley (2012) [Theorem 29.4]. A similar version of Theorem 4.7 can also be found in Section 5 of Arcones (1994).
We next present the following assumption which enables us to find the asymptotic behaviour of the unmixing matrix estimator using the Breuer-Major Theorem.
Assumption 4.8. Let x • := (x t ) t∈N be a process that satisfies Definition 3.1 with the trivial mixing matrix A = I d and let X be a C T ×d -valued, 1 ≤ d < T < ∞, sampled stochastic process generated by x • . Denote z • = (b t +ic t ) t∈N = b • +ic • . We assume that there exists ∈ N and a centered R -valued stationary Gaussian process η • , such that, for all k ∈ {1, . . . d}, the component b (k) • has the same finite-dimensional distributions and, asymptotically, the same autocovariance function asf k (η • ), for some functioñ f k : R → R. Similarly, we assume that, for all k ∈ {1, . . . d}, the component c (k) • has the same finite-dimensional distributions and, asymptotically, the same autocovariance function asf k+d (η • ), for some functionf k : R → R. Furthermore, we assume that E[|f k (η 1 )| 4 ] < +∞, ∀k ∈ {1, . . . , 2d}, and that r Note that, the finite dimensional distributions of the R-valued stochastic process c • is the collection of probability measures defined as, We again want to emphasize that a wide class of stochastic processes satisfy Assumption 4.8. Indeed, we allow an arbitrary dimension for the driving Gaussian process η • and arbitrary (summable) covariance structures. In comparison, it was shown in Viitasaari and Ilmonen (2020) that in many cases it would suffice to relate only one stationary Gaussian process η (j) • to each functionf j .
We are now ready to consider the asymptotic distribution for the unmixing matrix estimator under Assumption 4.8.
Theorem 4.9. Let Assumption 4.8 be satisfied and letĝ : C → (C − 1 Tμ )Γ be a T -indexed sequence of finite sample solutions. Then, under the trivial mixing scenario A = I d , there exists a T -indexed sequence Ĵ of phase-shift matrices, such that The exact forms for Σ ν and P ν are given in Appendix A.
Note that, it is possible to present Theorem 4.9 with even weaker assumptions. In the current formulation, we require that the Gaussian process y • (equivalently, η • ) has summable covariances and cross-covariances. By studying the exact Hermite ranks of the underlying functions, weaker summability conditions would suffice (cf. Theorem 4.7). However, given the Hermite ranks of the functions f j , it is in general impossible to say anything about the Hermite ranks of transformations such as f 2 j arising fromŜ 0 , see the proof of Theorem 4.9. On the other hand, assuming Hermite rank equal to one is a very natural assumption in many occasions. For example, the modeling approach given in Viitasaari and Ilmonen (2020) sets the rank of f j to equal 1. Moreover, Hermite rank 1 is stable in a sense that even small perturbations in a function with higher Hermite rank leads to rank 1 again. For further discussion on the stability of Hermite rank 1, we refer to Bai and Taqqu (2014).

Notes on non-central limit theorems
In this section, we provide examples where the convergence rate of the unmixing estimator differs from the standard √ T and where the limiting distribution is non-Gaussian. Such situations arise, especially, when the convergence summability condition of Theorem 4.7 does not hold.
In comparison to Assumption 4.8, we here assume that real-and imaginary parts of each component is driven by a single Gaussian process. As discussed in Section 4.1, this is not a huge restriction. In addition, we assume independent components and that the real-and imaginary parts of each component are independent. Although, independence is a common assumption in the blind source separation literature, we note that our results can be extended to cover dependencies as long as, for every j, k ∈ {1, 2, . . . , 2d}, the cross-covariances between η (k) and η (j) are negligible compared to the decay of the autocovariance functions r η (j) (t).
If the autocovariance functions r η (j) satisfy, then we are in the situation of Theorem 4.9. More generally, a weakly stationary series is called short-range dependent if the corresponding autocovariance function r satisfies Equation (4.3). Hence, we assume that at least one of the functions r η (j) is not summable. For such components, we assume the following long-range dependence condition.
Note that, the definition of long-range dependence varies in the literature. For details on long-range dependent processes and their different definitions, we refer to Beran (1994); Beran et al. (2013); Samorodnitsky (2007).
There is a large literature on limit theorems under long-range dependence. See, e.g., Dobrushin and Major (1979); Taqqu (1975); Bai and Taqqu (2013) for a few central works on the topic. In the case of long-range dependent processes, the rate of convergence of the normalized mean is slower than the usual √ T and the limiting distribution depends on the corresponding Hermite rank. More precisely, the limiting distribution follows a so-called q-Hermite distribution, where q is the Hermite rank of the underlying function. Note that, q-Hermite distributions can be fully characterized with the corresponding Hermite rank q and a self-similarity parameter H, i.e., the Hurst index. In the stable case q = 1, we obtain a Gaussian limit, and in the case q = 2 we obtain the so-called Rosenblatt distribution that is not Gaussian. Similarly, for q ≥ 3 the limiting distribution is not Gaussian. For details on Hermite distributions and processes, we refer to Embrechts and Maejima (2002); Tudor (2013); Bai and Taqqu (2013). In particular, we apply the following known result (see (Dobrushin and Major, 1979, Theorem 1)).
In view of Proposition 4.12, in the long-range dependent case the Hermite rank plays a crucial role. Thus, one cannot pose general results without any a priori knowledge on the ranks. Indeed, it can be shown -see, e.g., (Beran et al., 2013, Equation 4.26) with d = H − 1/2 -that if f has Hermite rank q, and r satisfies Equation (4.4) with some H such that q(2H − 2) > −1, then This justifies the normalization in Proposition 4.12 and also gives the constant explicitly, as E[Z q ] = 0 and E[(Z q ) 2 ] = 1. We pose the following assumption for the autocovariances r η (j) and Hermite ranks.
Assumption 4.13. We assume that for every j ∈ {1, 2, . . . , 2d} the autocovariance functions r η (j) with r η (j) (0) = 1 satisfy either Equation by q 2,i . For any indices j, k ∈ I, j = k, we assume that (4.6) and that for any We stress that Assumption 4.13 is not very restrictive. We allow a combination of short-and long-range processes η (j) • and we assume that at least one of the processes is long-range dependent, since otherwise we may apply Theorem 4.9. In this respect, Condition (4.6) guarantees that at least one of the [f i (x) − E(f i (η (i) ))] 2 is long-range dependent. Indeed, if max i∈I q i (2H i − 2) < −1, then Theorem 4.9 applies again. Moreover, one could also study the limiting case max i∈I q i (2H i − 2) = −1. Then, one usually obtains a Gaussian limit, but with a rate given by T /log(T ). Our more restrictive Assumptions (4.6) and (4.7) are posed in order to apply known results on limit theorems in the long-range dependent case. Indeed, if these conditions were violated, then we would need to study the convergence of complicated dependent random vectors towards some combinations of Hermite distributions. Unfortunately, to the best of our knowledge, there are no studies in this topic that would fit into our general setting (on some results in special cases, see e.g. Bai and Taqqu (2013)).
In view of Proposition 4.12, the rate of convergence, T γ , of our unmixing estimator arises from Equation (4.6). We set By Equation (4.6), we have that γ < 1 2 , meaning that our rate is slower than the usual √ T . The following results show that cross-terms do not contribute to the limit.
Proposition 4.14. Let Assumptions 4.10 and 4.13 be satisfied. Then for every j = k, we have that, and where L 2 − − → denotes convergence in the space L 2 (Ω, P). Consequently, the convergence also holds in probability.
In the presence of long-range dependency, the mean estimatorμ [X] in the definition ofS τ [X] has to be studied separately as it may contribute to the limit. andμ =μ − µ. Then for every element (j, k), such that j, k ∈ {1, 2, . . . , d} (Ω, P). Consequently, the convergence also holds in probability.
In order to obtain the limiting distribution for T γ (ĴΓ − I d ), we introduce some notation. For each k ∈ {1, 2, . . . , 2d} we define Hermite processes Z q 1,k and Z q 2,k , such that for a given k, we allow Z q 1,k and Z q 2,k to be dependent on each other, while both are independent of Z q 1,j , Z q 2,j for j = k. We also introduce constants C 1,k , C 2,k in the following way: if k ∈ I is such that equality holds in (4.7), i.e., max i∈I q 2,i (2H i − 2) = 2[q 1,k (2H k − 2)], then we let C 1,k to be the constant given in Proposition 4.12 associated to the limit Otherwise we set C 1,k = 0. Similarly, if k ∈ I is such that (4.11) we let C 2,k to be the constant given in Proposition 4.12 associated to the limit Otherwise we set C 2,k = 0. The following theorem is our main result of this subsection.
Theorem 4.16. Let Assumptions 4.10 and 4.13 hold. Then, where Υ is a R d×d -valued diagonal matrix with elements given by (Υ) jj ∼ C 2,j Z q 2,j + C 2 1,j Z 2 q 1,j + C 2,j+d Z q 2,j+d + C 2 1,j+d Z 2 q 1,j+d . We remark that it might be that most of the elements vanish. Indeed, the coefficient C 2,j (C 2,j+d , respectively) is non-zero only if the real part (imaginary part, respectively) of [f j (x) − E(f j (η))] 2 converges with the correct rate T γ . In this case, C 1,j (C 1,j+d , respectively) is non-zero only if the corresponding element inμ converges with the rate T γ/2 .

Image source separation
In this section, we present an image source separation example, where complex-valued signals are utilized. The restoration of images, which have been mixed together by some transformation, is a classical example in blind source separation (BSS), see e.g. Nordhausen et al. (2008). In the literature, the images source separation is usually only considered for black and white images. In the black and white case, we can choose some interval, for example [0,1], and assign every pixel a value belonging to the interval, e.g., such that the pixels with value 1 are black and pixels with value 0 are white. Different shades of gray can then be generated by considering values between 0 and 1. Since there is only a single color parameter associated with a single pixel, the black and white image source problem can be straightforwardly formulated as a real-valued BSS problem.
In our example, we consider colored photographs and conduct the example using the statistical software R (R Core Team (2018)). We start with three equally sized photographs containing 960 × 720 pixels and import them to R using the package JPEG, Urbanek (2014). The imported images are stored in a 960 × 720 × 3 array format, such that a red-green-blue color parameter triplet, denoted as (R, G, B), is assigned to every pixel. The R, G and B parameters are discrete, such that they can take 256 distinct values between zero and one.
Note that the color parameters R, G and B form a discrete color cube. As a preliminary step, we transform the original images such that the color parameter triplet of every pixel is projected to the closest point on the surface of the color cube. Hereby, we have for every pixel that either min(R, G, B) = 0 or max(R, G, B) = 1, without excluding the possibility of them both being simultaneously true. The color corrected images are presented in Figure 1. The color correction enables us to define a bijective transformation between the color cube surface, which we require for plotting the images, and the complex plane, where we perform the mixing and unmixing procedures.
After the color correction, we apply a bijective cube to unit sphere transformation for every color triplet. Then, we use the well-known stereographic projection, which is an almost bijective transformation between the unit sphere and the complex plane. The stereographic projection is bijective everywhere except for the north pole of the unit sphere. For almost all photographs, we can choose the color coordinate system such that no pixel has a color triplet located on the north pole. This holds for our example and we can apply the inverse mappings in order to get back to the color cube surface.
We then have a single complex-number corresponding to every pixel of the color corrected images and we can present the images in a C 960×720 -matrix form. We denote the images with complex-valued elements as Z 1 , Z 2 and Z 3 and apply the following transformation, where I 3 is the identity matrix and the elements of E and F were generated independently from the univariate uniform distribution Unif(-1/2, 1/2). The complex-valued mixed images X 1 , X 2 and X 3 can then be plotted by performing the chain of inverse mappings in reverse order. The corresponding images are presented in Figure 2.
We then apply the unmixing procedure presented in Section 4, with several different lag parameters τ . In controlled settings, where the true mixing matrix is known, the minimum distance index (MD) index is a straightforward way to compare the performance of different estimators, see Lietzén et al. (2016); Lietzén et al. (2019) for the complex-valued formulation. The MD index is a performance index, defined between zero and one, where zero corresponds to the separation being perfect. In addition, the MD index ensures that comparison between different estimators is fair, this is especially important in this paper since under our model assumptions, different estimators do not necessarily estimate the same population quantities. For additional details, see Ilmonen et al. (2012a); Lietzén et al. (2016); Lietzén et al. (2019) We tried several different lag parameters and the best performance, in the light of the MD index and visual inspection, was attained with τ = 1 and the Figure 3: Unmixed images using the AMUSE procedure with τ = 1.
corresponding MD index value was approximately 0.173. The unmixed images obtained using the unmixing procedure with τ = 1 are presented in Figure  3. In addition, for example, lag parameter choices τ = 2, 3, 961, 962, 963 provided only slightly worse results, the corresponding MD index values were between 0.177 and 0.183. Note that the autocovariances are calculated from the vectorized images, and for example the first and the 961st entries are neighboring pixels in the unvectorized images.
For comparison, we also applied complex-valued versions of the fourthorder blind identification (FOBI) and the second-order blind identification (SOBI) procedures, see Ilmonen (2013); Belouchrani et al. (1997). In this example, the the unmixing procedure with τ = 1 outperformed both FOBI and SOBI. The MD index of the FOBI procedure was approximately 0.318. We applied the SOBI procedure with several different lag parameter combinations, and the best performance, when excluding the cases that correspond to the AMUSE transformation, was obtained with τ = {1, 962} and the corresponding MD index value was approximately 0.177.
When comparing the original color corrected images in Figure 1 and the unmixed images in Figure 3, the shapes seem to match almost perfectly. However, the difference of the colors is significant. The colors vary since the complex phase is not uniquely fixed in our model, recall Lemma 3.3. In addition, recall that under our model, solutions that are a phase-shift away from each other are considered to be equivalent. In Figure 4, we present the first unmixed image, produced by the unmixing procedure with τ = 1, under three equivalent solutions. The images in Figure 4 are obtained such that we first find an unmixing estimateΓ and then right-multiply the obtained estimate with a diagonal matrix with diagonal entries exp(iθ). In Figure 4, the leftmost image is obtained with θ = π/4, the middle is obtained with θ = 3π/4 and the rightmost image is obtained with θ = 5π/4. Kilpinen for providing the photographs for Section 5.

A Appendix
Appendix A contains the proofs of the results given in this paper. We begin by presenting a lemma, that we later utilize in the upcoming proofs.
Lemma A.1. LetΣ be a T -indexed sequence of C d×d -valued nonsingular estimates and let α T be a real-valued sequence that satisfies α T ↑ ∞ as T → ∞. Furthermore, let α T (Σ − I d ) = O p (1). Then, the following two statements hold.
Proof of Lemma A.1. The assumption α T (Σ − I d ) = O p (1), implies that as T → ∞. Note that the matrix inversion and the conjugate symmetric square root of the inversion are continuous transformations in the neighborhood of I d . Thus, we can apply the continuous mapping theorem, which gives us that Σ a P − → I d , as T → ∞, for a ∈ {−1, − 1 2 }. Using part (i) and Slutsky's lemma, we get that the inverse is uniformly tight, since, For the final part, first note that, .
SinceΣ − 1 2 + I d converges in probability to 2I d and the matrix inversion is a continuous transformation in the neighborhood of 2I d , the continuous mapping theorem gives us that ( Thus by Slutsky's lemma, Proof of Lemma 2.3. Both directions of the claim follow directly from the multivariate version of the continuous mapping theorem. Note that, the mapping f : (r 1 , r 2 , . . . , r 2d ) → (r 1 , r 2 , . . . , r d ) + i(r d+1 , r d+2 , . . . , r 2d ) : R 2d → C d is homeomorphic, that is, f is continuous and bijective, and the preimage f −1 is also continuous.
Proof of Corollary 2.4. The corollary follows directly by applying Lemma 2.3 and by calculating the mean, the covariance and the relation matrix of z.
Proof of Lemma 3.3. Let y • := g • x • . First, let y • be a solution, that is, y • satisfies conditions (1)-(4) of Definition 3.1. Using condition (1), we get that, and thus µ x = µ. Next, we can rewrite condition (2) as, which implies that ΓA is a unitary matrix. Similarly, we can rewrite condition (4) as, which is equivalent with, Since Λ τ has real-valued distinct diagonal entries and since Λ τ and A H Γ H commute, we get that ΓA is also a diagonal matrix. Consequently, ΓA is a unitary diagonal matrix, which implies that the diagonal elements of the matrix product are of the form exp(iθ 1 ), . . . , exp(iθ d ).
For the second part of the proof, let µ = µ x and ΓA = J, where J is some phase-shift matrix. We next verify that y • := g • x • is a solution, that is, we verify that y • satisfies conditions (1)-(4) of Definition 3.1. Condition (1) is clearly satisfied, since E[x t ] = µ x . Furthermore, we have that the following holds for every τ, t ∈ N, Thus, conditions (2)-(3) are satisfied. Finally, we have for a fixed τ and for every t ∈ N that, since diagonal matrices commute. Thus, condition (4) is satisfied and the proof is complete.
Proof of Theorem 3.4. In order to simplify the notation, we denote S 0 := S 0 [x t ] and S τ := S τ [x t ]. First, let g be a solution. Then, since x • follows the mixing model, Lemma 3.3 gives, Using the above expressions, we get that, and Equation (3.1) follows by left-multiplying with S −1 0 . In addition, Lemma 3.3 directly gives Equation (3.2). Since the process x • is weakly stationary, the previous is true for every t ∈ N.
Next, let Equations (3.1) and (3.2) be satisfied for every t ∈ N. Leftmultiplying with S 0 , Equation (3.1) can be reformulated as, which, after a left-multiplication with A −1 , gives that A H Γ H and Λ τ commute. Since the diagonal matrix Λ τ has real-valued distinct diagonal elements, we get that A H Γ H is a diagonal matrix. Then, the scaling-equation gives that ΓAA H Γ H = I d , i.e., ΓA is a unitary matrix. Consequently, ΓA = J, where J is some phase-shift matrix. By Lemma 3.3, the functional g is hereby a solution to the corresponding unmixing problem, and the proof is complete.
Proof of Corollary 3.5. Under the assumptions of Definition 3.1, we have that S 0 is a positive-definite matrix and that S τ is a conjugate symmetric matrix. Thus, the matrix-square root of S 0 exists and the conjugate symmetric matrix-square root is unique, and consequently similar arguments as in the real-valued counterpart presented in Ilmonen et al. (2012b) can be applied here. Hereby, S Proof of Lemma 3.6. By Lemma 3.3, we have that ΓA = J 1 andΓCA = J 2 , where J 1 and J 2 are some phase-shift matrices. Recall that the mixing matrix A is nonsingular. Hereby, we get that Γ = J 1 A −1 andΓ = J 2 (CA) −1 . By using the obtained expressions for Γ andΓ, we get that, where J 3 = J 2 J H 1 is a phase-shift matrix, as the set of phase-shift matrices is stable under matrix multiplication.
Proof of Lemma 4.2. By assumption, the components of z • have continuous marginals, which implies that covariance matrix estimates are almost surely nonsingular. Thus, we can almost surely find a unique matrixΣ 0 , which is the conjugate symmetric inverse square root ofŜ 0 . We proceed to verify thatΓ =V Proof of Lemma 4.3. By Lemma 4.2, we can writeΓ =V HΣ 0 , whereΣ 0 denotes the conjugate symmetric inverse square root ofŜ 0 . Note that the matrix-valued estimators are affine equivariant in the sense that, j ∈ {0, τ }, for all nonsingular C d×d -matrices B and all C d -vectors b. We proceed to verify that JΓB −1 satisfies the criteria of a finite sample solution, given in Definition 4.1. Using the affine equivariance property, we get that, and, Hereby, JΓB −1 is a finite sample solution forX and the proof is complete.
We next denoteΣ τ =Σ 0ŜτΣ0 and show that γ T (Σ τ − Λ τ ) = O p (1). The uniform tightness follows from Lemma A.1, Slutsky's Lemma and the following factorization, Next, recall the eigendecompositionΣ τ =VΛ τV H and that the space of unitary matrices is compact. Consequently, U = O p (1) for any unitary U. By right-multiplying both sides of the eigendecomposition withV and by subtracting Λ τ from both sides, we get, where both sides of the equation can be further factorized as follows, and by multiplying with γ T and by rearranging the terms, we obtain, By assumption, the diagonal matrix Λ τ has distinct real-valued diagonal elements. Furthermore, since the matrixΛ τ is obtained by estimating an eigendecomposition, it is also diagonal, with diagonal elements denoted aŝ λ τ . Then, consider the element (j, k), j = k, of Equation (A.1), converges in probability to λ (k) τ . Furthermore, since the diagonal elements of Λ τ are distinct, we can divide both sides of Equation (A.2) with λ τ , which gives us that γ T [V] jk = O p (1) holds asymptotically when j = k.
Next, letĴ be a T -indexed sequence of phase-shift matrices, such that the diagonal entries of the productVĴ H are in the positive real-axis for every T ∈ N. Note that any complex-number can be expressed in the form r exp(iθ), where θ is the phase and r is the modulus, i.e., the length of the complex number. In the matrix case, we can similarly expressV such thatV =V rVθ , whereV θ is a phase-shift matrix that contains the phases of the diagonal diagonal elements ofV and consequentlyV r is a matrix with real-valued diagonal entries. Then, given a phase-shift matrixV θ = diag(exp(iθ 1 , . . . , iθ d )), we can always construct a matrix J H = diag(exp(−iθ 1 , . . . , −iθ d )) such that the productVĴ H has real-valued diagonal entries.
After the diagonal entries have been rotated to the positive real-axis, the rotated diagonal elements are equal to the corresponding moduli, that is, V kkĴ H kk = |V kk |. Then, sinceV is unitary, each of its row vectors has length one and the absolute value of a single element is at most one, which gives us, By unitarity ofV, the above square root is always between zero and one, and thus squaring the square root produces a smaller or equal number. Hereby, using the (asymptotic) uniform tightness of the off-diagonal elements, we get that asymptotically, Hereby, by combining the results for the diagonal and off-diagonal elements, we get that there exists a sequence of phase-shift matricesĴ such that γ T (VĴ H − I d ) = O p (1) holds asymptotically.
The claim of the lemma can then be written as, Proof of Theorem 4.5. LetŜ j :=Ŝ j [X], τ ∈ {0, τ }, and recall the equations from Definition 4.1, Similarly, by considering the element (j, k), j = k, of Equation (A.5), we get that The theorem then follows by multiplying both sides with γ T .
Note that the diagonal elements of Σ ν are real-valued and recall thath j,j = 0 for every j ∈ {1, . . . d}.
The convergence follows from independence and the continuous mapping theorem, if, ∀j ∈ {1, . . . , 2d}, the following two-dimensional vector converges, By using Equation (4.5) for the asymptotic variance, we observe first that converges to a non-trivial limit only if max i∈I q 2,i (2H i − 2) = q 2,j (2H j − 2). Similarly, converges to a non-trivial limit only if max i∈I q 2,i (2H i − 2) = q 1,j (4H j − 4). Finally, if both of these conditions are satisfied, the convergence in Equation (A.8) follows from (Bai and Taqqu, 2013, Theorem 4).