Prediction in polynomial errors-in-variables models

A multivariate errors-in-variables (EIV) model with an intercept term, and a polynomial EIV model are considered. Focus is made on a structural homoskedastic case, where vectors of covariates are i.i.d. and measurement errors are i.i.d. as well. The covariates contaminated with errors are normally distributed and the corresponding classical errors are also assumed normal. In both models, it is shown that (inconsistent) ordinary least squares estimators of regression parameters yield an a.s. approximation to the best prediction of response given the values of observable covariates. Thus, not only in the linear EIV, but in the polynomial EIV models as well, consistent estimators of regression parameters are useless in the prediction problem, provided the size and covariance structure of observation errors for the predicted subject do not differ from those in the data used for the model fitting.


Introduction
We deal with errors-in-variables (EIV) models which are widely used in system identification [10], epidemiology [2], econometrics [12], etc. In such regression models (with unknown parameter β), the response variable y depends on the covariates z and ξ, where z is observed precisely and ξ is observed with error. We consider the classical measurement error δ, i.e., instead of ξ the surrogate data x = ξ + δ is observed; moreover, the model is structural, i.e. z, ξ and δ are mutually independent, and we have i.i.d. copies of the model (z i , ξ i , δ i , x i = ξ i + δ i , y i ), i = 1, . . . , n. The measurement error can be nondifferential, when the distribution of y given (ξ, z, x) depends only on (ξ, z), and differential, otherwise [2, Section 2.5].
The present paper is devoted to the prediction of the response variable from ξ and z. Based on the observations (y i , z i , x i ), i = 1, . . . , n, and given new values z 0 and x 0 of z and x variables, we want to predict either the new y 0 (this procedure is called individual prediction) or the exact relation η 0 = E [ y 0 | z 0 , ξ 0 ], where ξ 0 is a new value for ξ (this procedure is called mean prediction). Both prediction problems are important in econometrics [5]. The individual prediction is used in the Leave-one-out cross-validation procedure.
The best mean squared error individual predictor iŝ and the best mean squared error predictor of η 0 iŝ For the nondifferential measurement error, and the best mean predictor coincides with the best individual predictor, but this needs not to hold for the differential measurement error. Both predictors (1) and (2) are unfeasible, because they involve unknown model parameters. Our goal is to construct consistent estimators of the predictors as the sample size n grows.
The nonparametric individual prediction under errors in covariates is studied in [7]. Below we consider only parametric models.
For scalar linear EIV models with normally distributed ξ and δ, it is stated in [4, Section 2.5.1] that the ordinary least squares (OLS) predictor should be used even when dealing with the EIV model. This is quite surprising, since the OLS estimator of β is inconsistent due to the attenuation effect [4]. In fact, there is no surprise that in a Gaussian model the linear OLS estimator provides a consistent prediction, since the Gaussian dependence is always linear. In the present paper, we consider a non-Gaussian regression model, since the distribution of the observable covariate z is not assumed Gaussian; therefore, the consistency of OLS predictions in such a model is a nontrivial feature.
We confirm the assertion, that the OLS estimator yields a suitable prediction under the model validity, for two kinds of EIV models: multivariate linear and polynomial. For this purpose, we just follow the recommendation of [4, Section 2.6] and analyze the regression of y on the observable z and x. In other nonlinear EIV models, the OLS predictor (contaminated from the initial regression y on (z, ξ), where we naively substitute x for ξ) is inconsistent; instead the least-squares predictor can be used from the regression y on (z, x).
The paper is organized as follows. In Sections 2 and 3, we state the results on prediction in multivariate linear and polynomial EIV models, respectively. Section 4 studies briefly some other nonlinear EIV models, and Section 5 concludes.
Through the paper, all vectors are column ones, E stands for the expectation and acts as an operator on the total product, and Cov (x) denotes the covariance matrix of a random vector x. By I p we denote the identity matrix of size p. For symmetric matrices A and B of the same size, A > B and A ≥ B means that A − B is positive definite or positive semidefinite, respectively.
2 Prediction in a multivariate linear EIV model

Model and main assumptions
Consider a multivariate linear EIV model with the intercept term (structural case): Here the random vector y is the response variable distributed in R d ; the random vector z is the observable covariate distributed in R q , the random vector ξ is the unobservable (latent) covariate distributed in R m ; x is the surrogate data observed instead of ξ; e + ǫ is the random error in y, δ is the measurement error in the latent covariate; C ∈ R q×d , B ∈ R m×d and b ∈ R d contain unknown regression parameters, where b is the intercept term. The random vector e models the error in the regression equation, and ǫ models the measurement error in y; ǫ can be correlated with δ. Such models are studied, e.g., in [11,10,9] in relation to system identification problems and numerical linear algebra. We list the model assumptions.

(i)
Three vectors z, ξ, e and the augmented measurement error vector ǫ T , δ T T are independent with finite 2nd moments; the errors ǫ and δ can be correlated.
(iii) The errors e, ǫ and δ have zero means.
Introduce the cross-covariance matrix The classical measurement error δ is nondifferential if, and only if, ǫ and δ are independent, i.e. Σ ǫδ = 0 (see Section 1 for the definition of the nondifferential error). We denote also Thus, Σ 11 is a block-diagonal matrix, and sometimes we will use Σ 22 for the covariance matrix of x.

Regression of y on z and x
Lemma 1. Assume conditions (i) to (iv).
(a) The response variable (3) can be represented as where z, x, and u are independent, C remains unchanged compared with (3), (b) Assume additionally the following condition: Then the error term u in (6) has a positive definite covariance matrix, Σ u .
As a particular case take a model with a univariate response and univariate regressor ξ.
Then expressions (6)-(8) hold true, where the error term u has a positive variance A direct computation shows that Here in the scalar case we write σ 2

Individual prediction
Now, consider independent copies of the multivariate model (3), (4): Based on the observations and for given z 0 , x 0 , we want to estimate the individual predictorŷ 0 presented in (1) and the mean predictorη 0 presented in (2). Assume conditions (i) to (iv) and suppose that all model parameters are unknown. Lemma 1 implies the expansion (6) with E u = 0. All the underlying random vectors have finite 2nd moments, hencê is the best mean squared error predictor of y 0 . Since it is unfeasible, we have to estimate the coefficients b x , C and B x using the sample (14). The OLS estimator b x ,Ĉ,B x minimizes the penalty function Let bar denote the average over i = 1, . . . , n, e.g., and S uv denote the sample covariance matrix of u and v variables, e.g., etc. The OLS estimator can be computed from the relations [11] y =b x +Ĉ Tz +B T xx , Hereafter A + is the pseudo-inverse of a square matrix A; see the properties of A + in [8]. The corresponding OLS predictor is is a strongly consistent estimator of the best predictorŷ 0 , i.e.ỹ 0 →ŷ 0 a.s. as n tends to infinity. Moreover, Proof. By Strong Law of Large Numbers we have a.s. as n → ∞: This convergence, relation (17) and the a.s. convergence of the sample means imply thatb x → b x a.s. Now, both statements of Theorem 1 follow from (19) and (15).
It is interesting to construct an asymptotic confidence region for the response y 0 based on the OLS predictor. Assume (i) to (iv). It holds see (12). Introduce the estimator (a) Assume additionally (v) and define (b) Let the model (3)-(4) be purely normal, i.e. z is normally distributed and e = 0. Assume additionally that the matrix (9) is nonsingular. Define where Proof. If b x , C, and B x were known, then we could approximate Σ u as follows: Since u i u T i is a quadratic function of the coefficients b x , C, B x , and the OLS estimators of those coefficients are strongly consistent, the convergence (25) remains valid if we replace all u i with the residualŝ (a) Under (v), Σ u is nonsingular by Lemma 1(b). It holds Since the relations (20) and (26) hold true, the relations (22), (21) follow.
(b) Again, in this purely normal model the matrix Σ u is nonsingular; conditional on z 0 and x 0 , the difference y 0 −ŷ 0 = u 0 has the normal distribution N (0, Σ u ). Then Since the relations (26) and (20)

Mean prediction
Still consider the model (3), (4) under conditions (i) to (iv). We want to estimate the mean predictorη 0 presented in (2). We havê and by (11), Based on observations (14), strongly consistent and unbiased estimators of µ and Σ x are as follows:μ Theorem 3. Assume conditions (i) to (iv) and suppose that Σ ǫδ is the only model parameter which is known. Consider the estimators (19), (28), and (29). Theñ is a strongly consistent estimator of the mean predictor (2), and moreover Proof. The statement follows from relation (27), Theorem 2, and the strong consistency of the estimatorsμ andΣ x .
Notice that more model parameters should be known in order to construct a confidence region for η 0 aroundη 0 .

Model and main assumptions
For a fixed and known k ≥ 2, consider a polynomial EIV model (structural case): Here the random variable (r.v.) y is the response variable; the random vector z is the observable covariate distributed in R q , r.v. ξ is the unobservable covariate; x is the surrogate data observed instead of ξ; e is the random error in the equation, ǫ and δ are the measurement errors in the response and in the latent covariate; c ∈ R q , β 0 ∈ R and β = (β 1 , . . . , β k ) T ∈ R k contain unknown regression parameters; ǫ and δ can be correlated. Such models are studied, e.g., in [3,6] and applied, for instance, in econometrics. Let us introduce the model assumptions. We see that assumptions (a) to (d) are similar to conditions (i) to (iv) imposed on the multivariate linear model, but now the response and latent covariate are real valued.

Lemma 3. Assume conditions (a) to (d). Then the response variable (30) admits the representation
where z and (x, u) T are independent, the vector c remains unchanged compared with (30), E [ u| x] = 0, E u 2 x < ∞, and β 0x ∈ R, β x ∈ R k are transformed (nonrandom) parameters of the polynomial regression.

Individual and mean prediction
We consider independent copies of the polynomial model (30)-(31): Based on observations (14) and for given z 0 , x 0 , we want to estimate the individual predictorŷ 0 and the mean predictorη 0 for the polynomial model. Assume conditions (a) to (d) and suppose that all model parameters are unknown. Lemma 3 implies the expansion (33) with E [u|x, z] = 0. All the underlying r.v.'s and the random vector z have finite 2nd moments, hencê is the best mean squared error predictor of y 0 . We estimate the coefficients c, β 0x and β x using the sample (14) from the polynomial model. The OLS estimator minimizes the penalty function Proof. Following the lines of the proof of Theorem 2, it is enough to check the strong consistency of the estimatorsĉ andβ x . We have a.s. as n → ∞: By conditions (b) and (d), x is a nondegenerate Gaussian r.v., therefore, r.v.'s 1, x, . . . , x k are linearly independent in the Hilbert space L 2 (Ω, P) of square integrable r.v.'s, and the covariance matrix D is nonsingular. Relations (38), (40), and (41) imply that a.s. as n → ∞ And the statements of Theorem 4 follow.

Confidence interval for response in quadratic model
Consider a quadratic EIV model It is a particular case of the model (30), (31) with k = 2, z = 0 and ǫ = 0. We use notations (32). Our conditions are similar to (a)-(d), but we assume additionally that the reliability ratio is separated away from zero. Thus, assume the following conditions.
(f) Model parameters are unknown, but a lower bound K 0 for the reliability ratio (45) is given, with 0 < K 0 ≤ 1/2.

Consider indepedent copies of the quadratic model
Based on observations (y i , x i ), i = 1, . . . , n, and for a given x 0 , we can construct the OLS predictorỹ 0 , see (39), for y 0 with k = 2, z 0 = 0. Now, we show the way how to construct an asymptotic confidence interval for y 0 . (In a similar way this can be done for a polynomial EIV model of higher order.) First we write down the representation (36), (37). Denote We have with independent m x and γ: Then Here E ( u| x) = 0. From (3.4) we get that the best prediction iŝ Those coefficients can be estimated using the strongly consistent OLS estimator, cf. (38), β 1x β 2x = S + rr S ry , r := (x, x 2 ) T .
The OLS estimatorβ 0x satisfies y =β 0x +β 1xx +β 2x x 2 , and the OLS predictor of y 0 is equal tõ To construct a confidence interval for y 0 , we have to bound the conditional variance of u given x 0 . From (49) we have It holds a.s. as n → ∞: Therefore, we have a.s. as n → ∞: We have to bound the difference Here we used the relations Next, we express (52) through β ix rather than β i . Using (50) we get: Here A + := max(A, 0), A − := − min(A, 0), A ∈ R. Finally, We are ready to construct a confidence interval for y 0 .

Prediction in other EIV models
The OLS predictorỹ approximates the best mean squared error predictorŷ presented in (1) not only in the plynomial EIV model. Let us consider the model with exponential regression function where the real numbers β and λ are unknown regression parameters, and assume condition (e) from Section 3.4. Using expansion (47)-(46), we get Under mild conditions, the OLS predictorỹ 0 :=β x exp λ x · x 0 is a strongly consistent estimator ofŷ 0 , whereβ x andλ x are the OLS estimators of the regression parameters in the model (54).
Similar conclusion can be made for the trigonometric model where a k , 0 ≤ k ≤ m, b k , 1 ≤ k ≤ m, and ω > 0 are unknown regression parameters. Finally, we give an example of the model, where the OLS predictor does not approximate the best mean squared error predictor. Let where the real numbers β and a are unknown regression parameters, and assume condition (e) from Section 3.4; suppose also that σ 2 ξ and σ 2 δ are positive.
where φ and Φ are the pdf and cdf of γ 0 . Then the best mean squared error predictor is as follows: The LS estimatorsk x ,β x andb x of k x , β x and b x minimize the penalty function Under mild additional conditions, the LS estimators are strongly consistent, and the LS predictorỹ as the sample size grows.
Notice that for this model (57), the OLS predictorβ |x 0 +â| needs not to converge in probability toŷ 0 , where the OLS estimatorsβ andâ minimize the penalty function (y i − β|x i + a|) 2 .

Conclusion
We considered structural EIV models with the classical measurement error. We gave a list of models where the OLS predictor of response y 0 converges with probability one to the best mean squared error predictorŷ 0 = E [ y 0 | z 0 , x 0 ]. In such models, a functional dependenceŷ 0 =ŷ 0 (z 0 , x 0 ) belongs to the same parametric family as the initial regression function η 0 (z 0 , ξ 0 ) = E [ y 0 | z 0 , ξ 0 ]. Such a situation looks exceptional for nonlinear models, and we gave an example of model (57), where the OLS predictor does not perform well. We dealt with both the mean and individual prediction. They coincide in the case of nondifferential errors, where it is known that the errors in response and in covariates are uncorrelated. Otherwise, to construct the mean prediction, one has to know the covariance of the errors.
In linear models, we managed to construct an asymptotic confidence region for response around the OLS prediction, under totally unknown model parameters. In the quadratic model, we did it under the known lower bound of the reliability ratio. The procedure can be expanded to polynomial models of higher order.
Notice that in linear models without intercept and in incomplete polynomial models (like, e.g. y = β 0 + β 2 ξ 2 + e, x = ξ + δ), a prediction with (z, x) naively substituted for (z, ξ) in the regression of y on (z, x) can have huge prediction errors. As stated in [2, Section 2.6], predicting y from (z, x) is merely a matter of substituting known values of x and z into the regression model for y on (z, x). We can add that, in nonlinear EIV models, the corresponding error v = y − E [ y| z, x] has the variance depending on x, i.e., the regression of y on (z, x) is heteroskedastic; this should be taken into account in order to construct a confidence region for y in a proper way.
Finally, we make a caveat for practitioners. Consistent EIV regression parameter estimators are useful especially for prediction if the observation errors for the predicted subject differ from those in the data used for the model fitting. This is usually the case when the model is fitted by some experimental data while the prediction is made for a real world subject. The idea to use inconsistent OLS estimators for prediction in this case is not good.