Asymptotic normality of total least squares estimator in a multivariate errors-in-variables model $AX=B$

We consider a multivariate functional measurement error model $AX\approx B$. The errors in $[A,B]$ are uncorrelated, row-wise independent, and have equal (unknown) variances. We study the total least squares estimator of $X$, which, in the case of normal errors, coincides with the maximum likelihood one. We give conditions for asymptotic normality of the estimator when the number of rows in $A$ is increasing. Under mild assumptions, the covariance structure of the limit Gaussian random matrix is nonsingular. For normal errors, the results can be used to construct an asymptotic confidence interval for a linear functional of $X$.


Introduction
We deal with overdetermined system of linear equations AX ≈ B, which is common in linear parameter estimation problem [9]. If the data matrix A and observation matrix B are contaminated with errors, and all the errors are uncorrelated and have equal variances, then the total least squares (TLS) technique is appropriate for solving this system [9]. Kukush and Van Huffel [5] showed the statistical consistency of the TLS estimatorX tls as the number m of rows in A grows, provided that the errors in [A, B] are row-wise i.i.d. with zero mean and covariance matrix proportional to a unit matrix; the covariance matrix was assumed to be known up to a factor of proportionality; the true input matrix A 0 was supposed to be nonrandom. In fact, in [5] a more general, element-wise weighted TLS estimator was studied, where the errors in [A, B] were row-wise independent, but within each row, the entries could be observed without errors, and, additionally, the error covariance matrix could differ from row to row. In [6], an iterative numerical procedure was developed to compute the elementwise-weighted TLS estimator, and the rate of convergence of the procedure was established.
In a univariate case where B and X are column vectors, the asymptotic normality ofX tls was shown by Gallo [4] as m grows. In [7], that result was extended to mixing error sequences. Both [4] and [7] utilized an explicit form of the TLS solution.
In the present paper, we extend the Gallo's asymptotic normality result to a multivariate case, where A, X, and B are matrices. Now a closed-form solution is unavailable, and we work instead with the cost function. More precisely, we deal with the estimating function, which is a matrix derivative of the cost function. In fact, we show that under mild conditions, the normalized estimator converges in distribution to a Gaussian random matrix with nonsingular covariance structure. For normal errors, the latter structure can be estimated consistently based on the observed matrix [A, B]. The results can be used to construct the asymptotic confidence ellipsoid for a vector Xu, where u is a column vector of the corresponding dimension.
The paper is organized as follows. In Section 2, we describe the model, refer to the consistency result for the estimator, and present the objective function and corresponding matrix estimating function. In Section 3, we state the asymptotic normality ofX tls and provide a nonsingular covariance structure for a limit random matrix. The latter structure depends continuously on some nuisance parameters of the model, and we derive consistent estimators for those parameters. Section 4 concludes. The proofs are given in Appendix. There we work with the estimating function and derive an expansion for the normalized estimator using Taylor's formula. The expansion holds with probability tending to 1.
Throughout the paper, all vectors are column ones, E stands for the expectation and acts as an operator on the total product, cov(x) denotes the covariance matrix of a random vector x, and for a sequence of random matrices {X m , m ≥ 1} of the same size, the notation X m = O p (1) means that the sequence { X m } is stochastically bounded, and X m = o p (1) means that X m P −→ 0. By I p we denote the unit matrix of size p.

The TLS problem
Consider the model AX ≈ B. Here A ∈ R m×n and B ∈ R m×d are observations, and X ∈ R n×d is a parameter of interest. Assume that and that there exists X 0 ∈ R n×d such that Here A 0 is the nonrandom true input matrix, B 0 is the true output matrix, andÃ,B are error matrices. The matrix X 0 is the true value of the parameter.
We can rewrite the model (2.1)-(2.2) as a classical functional errors-in-variables (EIV) model with vector regressor and vector response [3]. Denote by , andb T i the rows of A, A 0 ,Ã, B, B 0 , andB, respectively, i = 1, . . . , m. Then the model considered is equivalent to the following EIV model: Based on observations a i , b i , i = 1, . . . , m, we have to estimate X 0 . The vectors a 0i are nonrandom and unknown, and the vectorsã i ,b i are random errors. We state a global assumption of the paper.
. . , are i.i.d., with zero mean and variance-covariance matrix where the factor of proportionality σ 2 is positive and unknown.
The TLS problem consists in finding the values of disturbances ∆Â and ∆B minimizing the sum of squared corrections min (X∈R n×d , ∆A, ∆B) subject to the constraints Here in (2.4), for a matrix C = (c ij ), C F denotes the Frobenius norm, ij . Later on, we will also use the operator norm C = sup x =0 Cx x .

TLS estimator and its consistency
It may happen that, for some random realization, problem (2.4)-(2.5) has no solution.
In such a case, putX tls = ∞. Now, we give a formal definition of the TLS estimator. We need the following conditions for the consistency ofX tls .
where V A is a nonsingular matrix. The next consistency result is contained in Theorem 4(a) of [5].

Theorem 2.
Assume condition (i) to (iii). ThenX tls is finite with probability tending to one, andX tls tends to X 0 in probability as m → ∞.

The objective and estimating functions Denote
The TLS estimator is known to minimize the objective function (2.7); see [8] or formula (24) in [5].

Lemma 3.
The TLS estimatorX tls is finite iff there exists an unconstrained minimum of the function (2.7), and thenX tls is a minimum point of that function.
Introduce an estimating function related to the loss function (2.6): Expression (2.8) as a function of X is a mapping in R n×d . Its derivative s ′ X is a linear operator in this space.

Lemma 5.
Under condition (i), for each H ∈ R n×d and i ≥ 1, we have

Main results
Introduce further assumptions to state the asymptotic normality ofX tls . We need a bit higher moments compared with conditions (ii) and (iii) in order to use the Lyapunov CLT. Recall thatz i satisfies condition (i).
(v) For δ from condition (iv), (vii) The distribution ofz 1 is symmetric around the origin.
Introduce a random element in the space of systems consisting of five matrices: Hereafter d −→ stands for the convergence in distribution.

2)
where Γ is a Gaussian centered random element with matrix components.  3) still holds, and, moreover, the limit random matrix X ∞ := V −1 A Γ (X 0 ) has a nonsingular covariance structure, that is, for each nonzero vector u ∈ R d×1 , cov(X ∞ u) is a nonsingular matrix.

Remark 9.
Conditions of Theorem 8(a) are similar to Gallo's conditions [4] for the asymptotic normality in the univariate case; see also, [9], pp. 240-243. Compared with Theorems 2.3 and 2.4 of [7], stated for univariate case with mixing errors, we need not the requirement for entries of the true input A 0 to be totally bounded.
In [7], Section 2, we can find a discussion of importance of the asymptotic normality result forX tls . It is claimed there that the formula for the asymptotic covariance structure ofX tls is computationally useless, but in case where the limit distribution is nonsingular, we can use the block-bootstrap techniques when constructing confidence intervals and testing hypotheses.
However, in the case of normal errorsz i , we can apply Theorem 8(b) to construct the asymptotic confidence ellipsoid, say, for X 0 u, u ∈ R d×1 , u = 0. Indeed, relations (3.1)- (3.4) show that the nonsingular matrix is a continuous function S u = S u (X 0 , V A , σ 2 ) of unknown parameters X 0 , V A , and σ 2 . (It is important here that now the components Γ j of Γ are independent, and the covariance structure of each Γ j depends on σ 2 and V A , not on some other limit characteristics of A 0 ; see Lemma 6.) Once we possess consistent estimatorsV A and σ 2 of V A and σ 2 , the matrixŜ u := S u (X tls ,V A ,σ 2 ) is a consistent estimator for the covariance matrix S u . Hereafter, a bar means averaging for rows i = 1, . . . , m, for example,

Lemma 10. Assume the conditions of Theorem 2. Definê
Remark 11. Estimator (3.5) is a multivariate analogue of the maximum likelihood estimator (1.53) in [2] in the functional scalar EIV model.
Finally, for the casez 1 ∼ N (0, σ 2 I n+d ), based on Lemma 10 and the relations we can construct the asymptotic confidence ellipsoid for the vector X 0 u in a standard way.

Remark 12.
In a similar way, a confidence ellipsoid can be constructed for any finite set of linear combinations of X 0 entries with fixed known coefficients.

Conclusion
We extended the result of Gallo [4] and proved the asymptotic normality of the TLS estimator in a multivariate model AX ≈ B. The normalized estimator converges in distribution to a random matrix with quite complicated covariance structure. If the error distribution is symmetric around the origin, then the latter covariance structure is nonsingular. For the case of normal errors, this makes it possible to construct the asymptotic confidence region for a vector X 0 u, u ∈ R d×1 , where X 0 is the true value of X.
In future papers, we will extend the result for the elementwise weighted TLS estimator [5] in the model AX ≈ B, where some columns of the matrix [A, B] may be observed without errors, and, in addition, the error covariance matrix may differ from row to row.

Proof of Corollary 4
(a) For any n and d, the space R n×d is endowed with natural inner product A, B = tr(AB T ) and the Frobenius norm. The matrix derivative q ′ X of the functional (2.6) is a linear functional on R n×d , which can be identified with certain matrix from R n×d based on the inner product.
Using the rules of matrix calculus [1], we have for H ∈ R n×d : Collecting similar terms, we obtain: Using the inner product in R n×d , we get 1 is the left-hand side of (2.8). In view of Theorem 2 and Lemma 3, this implies the statement of Corollary 4(a).
(b) Now, we set where a 0 is a nonrandom vector, and, like in (2.3), Therefore (see (2.8)), This implies the statement of Corollary 4(b).

Proof of Lemma 5
The derivative s ′ X of the function (2.8) is a linear operator in R n×d . For H ∈ R n×d , we have: As before, we set (4.1), (4.2) and use relations (4.3), (4.4), and the relation E aa T = a 0 a T 0 + σ 2 I n . We obtain: This implies (2.9).

Proof of Lemma 6
The random elements W i , i ≥ 1, in (3.1) are independent and centered. We want to apply the Lyapunov CLT for the left-hand side of (3.2).
(a) All the second moments of m − 1 2 m i=1 W i converge to finite limits. For example, for the first component, we have and this has a finite limit due to assumption (iii). Here H 1 ∈ R n×n , and we use the inner product introduced in the proof of Corollary 4. For the fifth component, because the fourth moments ofb i are finite. Here H 2 ∈ R d×d . For mixed moments of the first and fifth components, we have and this, due to condition (vi), converges toward Other second moments can be considered in a similar way.
(b) The Lyapunov condition holds for each component of (3.1). Let δ be the quantity from assumptions (iv), (v). Then a 0i 2+δ → 0 as m → ∞ by condition (v). For the fifth component, The latter expectation is finite by condition (iv). The Lyapunov condition for other components is considered similarly.
(c) Parts (a) and (b) of the present proof imply (3.2) by the Lyapunov CLT.

Proof of Lemma 7
Under conditions (vii) and (i), all the five components of W i , which is given in (3.1), are uncorrelated (e.g., the cross-correlation like (4.6) equals zero, and condition (vi) is not needed). As in proof of Lemma 6, the convergence (3.2) still holds. The components Γ 1 , . . . , Γ 5 of Γ are independent because the components of W i are uncorrelated.

Proof of Theorem 8(a)
Our reasoning is typical for theory of generalized estimating equations, with specific feature that a matrix parameter rather than vector one is estimated. By Corollary 4(a), with probability tending to 1 we have m i=1 s(a i , b i ;X tls ) = 0. Now, we use Taylor's formula around X 0 with the remainder in the Lagrange form; see [1], Theorem 5.6.2. Denotê Then (4.7) implies the relation (4.8) Here O p (1) is a factor of the form Relation (4.8) holds with probability tending to 1 because, due to Theorem 2,X tls P −→ X 0 ; expression (4.9) is indeed O p (1) because the derivative s ′′ x is quadratic in a i , b i (cf. (4.5)), and the averaged second moments of [a T i , b T i ] are assumed to be bounded. Now, rest 1 ≤ ∆ · o p (1). Next, by Lemma 5 and condition (iii), Therefore, (4.8) implies that Now, we find the limit in distribution of y m / √ m. The summands in y m have zero expectation due to Corollary 4(b). Moreover (see (2.8)), Here W ij are the components of (3.1). By Lemma 6 we have (see (3.4  By condition (iii) the matrix V A is nonsingular. Thus, the desired relation (3.3) follows from (4.13).