Goodness-of-fit test in a multivariate errors-in-variables model $AX=B$

We consider a multivariable functional errors-in-variables model $AX\approx B$, where the data matrices $A$ and $B$ are observed with errors, and a matrix parameter $X$ is to be estimated. A goodness-of-fit test is constructed based on the total least squares estimator. The proposed test is asymptotically chi-squared under null hypothesis. The power of the test under local alternatives is discussed.


Introduction
We study an overdetermined system of linear equations AX ≈ B, which often occurs in the problems of dynamical system identification [10]. If matrices A and B are observed with additive uncorrelated errors of equal size, then the total least squares (TLS) method is used to solve the system [10].
In papers [3,7,9], under various conditions, the consistency of the TLS estimator X is proven as the number m of rows in the matrix A is increasing, assuming that the true value A 0 of the input matrix is nonrandom. The asymptotic normality of the estimator is studied in [3] and [6].
The model AX ≈ B with random measurement errors corresponds to the vector linear errors-in-variables model (EIVM). In [2], a goodness-of-fit test is constructed for a polynomial EIVM with nonrandom latent variable (i.e., in the functional case); the test can be also used in the structural case, where the latent variable is random with unknown probability distribution. A more powerful test in the polynomial EIVM is elaborated in [4].
In the paper [5], a goodness-of-fit test is constructed for the functional model AX ≈ B, assuming that the error matricesÃ andB are independent and the covariance structure ofÃ is known. In the present paper, we construct a goodness-of-fit test in a more common situation, where the total covariance structure of the matricesÃ andB is known up to a scalar factor. A test statistic is based on the TLS estimator X. Under the null hypothesis, the asymptotic behavior of the test statistic is studied based on results of [6] and, under local alternatives, based on [9].
The present paper is organized as follows. In Section 2, we describe the observation model, introduce the TLS estimator, and formulate known results on the strong consistency and asymptotic normality of the estimator. In the next section, we construct the goodness-of-fit test and show that the proposed test statistic has an asymptotic chi-squared distribution with the corresponding number of degrees of freedom. The power of the test with respect to the local alternatives is studied in Section 4, and Section 5 concludes. The proofs are given in Appendix. We use the following notation: C = i,j c 2 ij is the Frobenius norm of a matrix C = (c ij ), and I p is the unit matrix of size p. The symbol E denotes the expectation and acts as an operator on the total product of quantities, and cov means the covariance matrix of a random vector. The upper index ⊤ denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over i = 1, . . . , m, for example,ā : Convergence with probability one, in probability, and in distribution are denoted as P1 →, P →, and d →, respectively. A sequence of random matrices that converges to zero in probability is denoted as o p (1), and a sequence of stochastically bounded random matrices is denoted as O p (1). The notation ε d = ε 1 means that random variables ε and ε 1 have the same probability distribution. Positive constants that do not depend on the sample size m are denoted as const , so that equalities like 2 · const = const are possible.

Consider the observation model
where A 0 ∈ R m×n , X 0 ∈ R n×d , and B 0 ∈ R m×d . The matrices A and B contain the data, A 0 and B 0 are unknown nonrandom matrices, andÃ,B are the matrices of random errors. We can rewrite model (2.1) in an implicit way. Introduce three matrices of size m × (n + d): , and we use similar notation for the rows of the matrices C, A 0 , B 0 ,Ã,B, andC. Rewrite model (2.1) as a multivariate linear one: Throughout the paper, the following assumption holds about the errors with zero mean, and, moreover, Thus, the total error covariance structure is assumed to be known up to a scalar factor σ 2 , and the errors are uncorrelated with equal variances.
For model (2.1), the TLS problem lies in searching such disturbances ∆Â and ∆B that minimize the sum of squared corrections min (X∈R n×d ,∆A,∆B)

The TLS estimator and its consistency
It can happen that for certain random realization, the optimization problem (2.6)-(2.7) has no solution. In the latter case, we setX = ∞. We need the following conditions to provide the consistency of the estimator: The next result on the strong consistency of the estimator follows, for example, from Theorem 4.3 in [9]. Theorem 2. Assume conditions (i)-(iii). Then, with probability one, for all m ≥ m 0 (ω), the TLS estimatorX is finite, and, moreover,X P1 → X 0 as m → ∞.
Define the loss function Q(X) as follows: It is known that the TLS estimator minimizes the loss function (2.9); see formula (24) in [7]. Introduce the following unbiased estimating function related to the elementary loss function (2.8): Lemma 3. Assume conditions (i)-(iii). Then, with probability one, for all m ≥ m 0 (ω), the TLS estimatorX is a solution to the equation In view of Theorem 2, the statement of Lemma 3 follows from Corollary 4(a) in [6].

Asymptotic normality of the estimator
We need further restrictions on the model. Recall that the augmented errorsc i were introduced in Section 2.2, and the vectors a 0 i ,b i , and so on are those from model (iv) E c 1 4+2δ < ∞ for some δ > 0; (v) For δ from condition (iv), (vi) For all p, q, r = 1, . . . , n + d, we have Ec Under assumptions (i) and (iv), condition (vi) holds, for example, in two cases: (a) when the random vectorc 1 is symmetrically distributed, or (b) when the components of the vectorc 1 are independent and, moreover, for each p = 1, . . . , n + d, the asymmetry coefficient of the random variablec (p) 1 equals 0. Introduce the following random element in the space of collections of five matrices:

Denote byc
The next statement on the asymptotic normality of the estimator follows from the proof of Theorem 8(b) in [6], where, instead of condition (vi), there was a stronger assumption thatc 1 is symmetrically distributed, but the proof of Theorem 8(b) in [6] still works under the weaker condition (vi).

Theorem 4.
Assume conditions (i) and (iii)-(vi). Then: where Γ is a Gaussian centered random element with matrix components, where V A is from condition (iii), and Γ i is from condition (2.12).

Remark 5.
Under the assumptions of Theorem 4, the components of random element (2.11) are uncorrelated, and therefore, the components of the limit element Γ are uncorrelated as well.
Let a consistent estimatorf =f m of the vector f be given. We want to construct a consistent estimator of matrix (2.16). The matrix S(X 0 , f ) is expressed, for example, via the fourth moments of errorsc i , and those moments cannot be consistently estimated without additional assumptions on the error probability distribution. Therefore, an explicit expression for the latter matrix does not help to construct the desirable estimator. Nevertheless, we can construct something like the sandwich estimator [1, pp. 368-369].
The next statement on the consistency of the nuisance parameter estimators follows from the proof of Lemma 10 in [6]. Recall that the bar means averaging over the observations; see Section 1.

Lemma 6. Assume the conditions of Theorem 4. Define the estimators:
The next asymptotic expansion of the TLS estimator is presented in [6], formulas (4.10) and (4.11).

Lemma 7. Under the conditions of Theorem 4, we have:
In view of Lemma 7, introduce the sandwich estimatorŜ(f ) of the matrix (2.16): where the estimatorV A is given in (2.18).
Theorem 8. Let f ∈ R n×1 , and letf be a consistent estimator of this vector. Under the conditions of Theorem 4, the statisticŜ(f ) is a consistent estimator of the matrix Appendix contains the proof of this theorem and of all further statements.

Construction of goodness-of-fit test
For the observation model (2.4), we test the following hypotheses concerning the response b and the latent variable a 0 : In fact, the null hypothesis means that the observation model (1.3)-(1.4) holds. Based on observations a i , b i , i = 1, . . . , m, we want to construct a test statistic to check this hypothesis. Let We need the following stabilization condition on the latent variable:  To ensure the nonsingularity of the matrix Σ T , we impose a final restriction on the observation model:

Lemma 10. Assume conditions (i) and (iii)-(vii). Then
(viii) There exists a finite matrix limit and, moreover, the matrix S a is nonsingular.

Remark 12. Assume conditions (vii) and (viii). Then
and V A is nonsingular as a sum of positive definite and positive semidefinite matrices. Thus, condition (iii) is a consequence of assumptions (vii) and (viii).
Lemma 13. Assume conditions (i) and (iv)-(viii). Then: With probability tending to one as m → ∞, the symmetric matrixΣ T is positive definite as well.
For m ≥ 1 and ω from the underlying probability space Ω such thatΣ T is positive definite, we define the test statistic (3.7) Lemmas 10 and 11(b) imply the following convergence of the test statistic.

Power of the test
Consider a sequence of models Here g : R n → R d is a given nonlinear perturbation of the linear regression function.
For arbitrary function f (a 0 ), denote the limit of averages provided that the limit exists and is finite. In order to study the behavior of the test statistic under local alternatives H 1,m , we impose two restrictions on the perturbation function g: (ix) There exist M (g(a 0 )) and M (g(a 0 )a 0⊤ ). Under local alternatives H 1,m , we ensure the weak consistency and asymptotic normality of the TLS estimatorX.

Lemma 16. Assume the conditions of Lemma 15. Then under local alternatives
where Σ T is given by (3.5), and where C T is given in (4.2).
Theorem 18 makes it possible to find the asymptotic power of the test under local alternatives H 1,m . It is evident that the asymptotic power is an increasing function of τ = Σ −1/2 T C T . In other words, the larger τ , the more powerful the test.

Conclusion
We constructed a goodness-of-fit test for a multivariate linear errors-in-variables model, provided that the errors are uncorrelated with equal (unknown) variances and vanishing third moments. The latter moment assumption makes it possible to estimate consistently the asymptotic covariance matrix Σ T of the statistic T 0 m and construct the test statistic T 2 m , which has the asymptotic χ 2 d distribution under the null hypothesis. The local alternatives H 1,m are presented, under which the test statistic has the noncentral χ 2 d (τ ) asymptotic distribution. The larger τ , the larger the asymptotic power of the test.
In future, we will try to construct, like in [5], a more powerful test using within a test statistic the exponential weight function ω λ (a) = e λ ⊤ a , λ ∈ R n×1 .
To this end, it is necessary to require the independence he terrorsb i andã i and also the existence of exponential moments of the errorsã i . This is the price for a greater power of the test.

Appendix
By the mentioned theorem from [8] the presented bounds imply the desired convergence.
The next statement is a version of the Lyapunov CLT.

Lemma 20. Let {z i } be a sequence of independent centered random vectors in
Proof of Theorem 8. (a) We have: In the proof of Theorem 8(a) in [6], the following expansion of the estimating function is used: where W ij are the components of the matrix collection (2.11). We show that the term in parentheses on the right-hand side of (5.3) tends to zero in probability. Taking into account expansion (5.5), we write down one of summands of the expression S(f ): Let us explain why It suffices to consider the matrix Up to a constant, its entries contain summands of the form Applying Lemma 19 to the expression and for δ from condition (v), we have: Thus, by Lemma 19 expression (5.8) tends to zero in probability. Theñ whence we get (5.7).
In a similar way, other summands of S(f ) can be studied, and therefore, Next, we verify directly the convergence Therefore, S(f ) P → S(X 0 , f ). (b) Without any problem, in view of Theorem 2 and the consistency of estimatorŝ V A andf , the following convergences can be shown: Here Z is the matrix from relations (5.6). The desired convergence follows from the convergences established in parts (a) and (b) of the proof.

Proof of Lemma 9. For model (2.3)-(2.4), we have:
Proof of Lemma 10. By Theorem 4 Therefore, expansion (3.4) and condition (vii) imply that Next, by expansion (2.20) we get: The random vectors satisfy condition (5.2) with the number δ from assumptions (iv) and (v). Let us find the variance-covariance matrix Σ i of vector (5.13). We have Here (see (2.11) and (5.5)) A µ a σ 2 I d + X 0⊤ X 0 , and this coincides with the right-hand side of equality (3.5).
Finally, the desired convergence follows from expansion (5.12) by Lemma 20 and Slutsky's lemma.
Proof of Lemma 11. The convergenceμ a P1 → µ a is established by SLLN. The con-vergenceΣ T P → Σ T follows from Theorem 8 (the role of f andf is played by µ a andμ a , respectively) and the consistency of estimatorsσ 2 ,μ a , andV A .
Proof of Lemma 13. (a) Hereafter, for symmetric matrices A and B, notation A ≥ B (A > B) means that the matrix A − B is positive semidefinite (positive definite).
Condition (vi) ensures the independence of the matrix components Γ i in relation (2.12). Therefore, From equality (3.5) we have In the case µ a = 0, we get Σ T ≥ σ 2 I d > 0, and in the case µ a = 0, we put z = V −1 A µ a and obtain: thus, 1 > µ ⊤ a V −1 A µ a , and inequality (5.17) implies Σ T > 0. Statement (b) follows from statement (a) and Lemma 11(b).
Proof of Lemma 15. (a) The local alternative (4.1) is corresponding to the perturbation matrix Model (4.1) can be rewritten as a perturbed model (2.1), or as a perturbed model (2.2), Introduce the symmetric matrix Due to condition (iii), as m → ∞, Consider two matrices of size (n + d) × (n + d): In view of the proof of Theorem 4.1 in [9], for the convergencê it suffices to show that, as m → ∞, or taking into account (5.21), that We study the most interesting summands, those that contain G 0 (the convergence of other summands was shown in the proof of Theorem 4.1 in [9]). We have We established the convergence in probability for the summands from (5.26) and (5.27) that contain the perturbation G 0 . Therefore, (5.26) and (5.27) are satisfied, relation (5.25) is satisfied as well, and the results of [9] imply convergence (5.24).
The consistency of the estimatorσ 2 under local alternatives H 1,m is established by formula (2.17) and boils down to the consistency ofσ 2 under the null hypothesis: the consistency ofX has been proven already, and, moreover,  Thus, estimator (3.6) does converge in probability to matrix (3.5).