Estimation in a linear errors-in-variables model under a mixture of classical and Berkson errors

A linear structural regression model is studied, where the covariate is observed with a mixture of the classical and Berkson measurement errors. Both variances of the classical and Berkson errors are assumed known. Without normality assumptions, consistent estimators of model parameters are constructed and conditions for their asymptotic normality are given. The estimators are divided into two asymptotically independent groups.


Introduction
Regression models with measurement errors in covariates are quite popular nowadays [1,2,4], see also [5] for the comparison of various estimation methods in such models.
We consider a linear regression model under the presence of the classical and Berkson errors in the covariate: (1.1) Here, y is the observable response variable, ξ and x are unobservable latent variables, w is the observable surrogate variable; ε, δ and u are centred errors, ε is error in response, δ is the classical measurement error, and u is Berkson measurement error; random variables x, ε, δ and u are independent. In model (1.1), we have a mixture of the classical and Berkson errors. Let D stand for the variance. Indicate two extreme cases.
(1.2) (b) Du = 0, then u = 0, and (1.1) yields a linear model with the classical error [1,2] y = β 0 + β 1 ξ + ε, w = ξ + δ. Here, D mes i is the measured individual instrumental absorbed thyroid dose for the ith person of a cohort of persons residing in Ukrainian regions that suffered from the Chornobyl accident, D tr i is the corresponding true absorbed thyroid dose (i.e., the first latent variable),D i tr is the second latent variable; σ i γ i is the additive classical error, δ F,i is the multiplicative Berkson error, σ i is the standard deviation of the heteroscedastic classical measurement error, γ i is standard normal and δ F,i is a lognormal random variable;D i tr , γ i and δ F,i are independent random variables.
In [4], the model (1.4) is combined with the binary model which resembles a logistic one: where λ i is the total incidence rate related to cases of thyroid cancer, Here, positive regression coefficients λ 0 and EAR are the background incidence rate and the excess absolute risk, respectively. In the binary model ( The goal of the present paper is to study asymptotic properties of estimators of model parameters in the linear regression (1.1). The modest aim is to have a better understanding of the binary model (1.5), (1.6), (1.4) and similar models.
The paper is organized as follows. In Section 2, we present the observation model in more detail, and under the normality of x and u, derive from the underlying model the one like (1.3) with the classical error only. At that we obtain consistent estimators for β 0 and β 1 which unexpectedly coincide with the adjusted least squares estimators [2,4], constructed by ignoring Berkson error u. The proposed estimators remain consistent without the normality of x and u. Section 3 gives conditions for the asymptotic normality of the estimators, and we divide them into two asymptotically independent groups. In doing so, we reparametrize the model similarly to [3], where the basic model (1.3) was studied. Section 4 concludes our findings. We use the following notation. The symbol E denotes expectation and acts as an operator on the total product of quantities, cov stands for the covariance of two random variables and for the covariance matrix of a random vector. The upper index denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over i = 1, . . . , n, e.g., a : . Convergence with probability 1 and in distribution are denoted as P1 → and d →, respectively. A sequence of random variables that converges to zero in probability is denoted as o p (1), and a sequence of bounded in probability random variables is denoted as O p (1). I p stands for the identity p × p matrix.
(i) Random variables x, ε, δ and u are independent.
(ii) Random variables ε, δ and u have zero expectations and finite variances, and x has a finite and positive variance σ 2 x .
(iii) Variances of σ 2 δ and σ 2 u are positive and known, and other model parameters Consider independent copies of model (1.1): Under assumption (i), this means that random vectors (x i , ε i , δ i , u i ) , i = 1, 2, . . ., are i.i.d. and have the same distribution as (x, ε, δ, u) . Based on observations (y i , w i ), i = 1, . . . , n, we want to estimate the unknown model parameters.
Lemma 1. Consider the model (1.1) under conditions (i) and (ii). Let σ 2 δ be known and random variables x, ε, δ and u be Gaussian. Then this model with 6 unknown Proof. The distribution of the observed Gaussian vector Z := (y, w) is uniquely defined by EZ and C := cov(Z). Introduce two different collections of model parameters: In both cases it holds Therefore, the distribution of Z is the same for both collections of parameters, and the model is not identifiable.
Notice that under conditions of Lemma 1, the parameters β 0 and β 1 are identifiable (see [2] for the definition of an identifiable parameter). Moreover, in the next subsection we will construct consistent estimators, as n → ∞, for β 0 and β 1 under the only known parameter σ 2 δ .

Consistent estimators of model parameters
Now, besides conditions (i) to (iii), we assume the following.
(iv) Random variables x and u are Gaussian. Now, out of (1.1) we derive a linear model with the classical error only. The conditional distribution of x given ξ is as follows [1,4]: where ξ , γ , ε, δ are mutually independent. Then Introduce new variables We derived a linear model with the classical error: Suppose at the moment that K is known. Then the adjusted least squares (ALS) estimator β 1 of β 1 is consistent and given as [2,4]: When K is unknown, we can estimate it consistently as 2) instead of K and obtain the desired estimator Next, in model (2.1) the ALS estimator of β 0 is as follows [2,4]: But K and μ are unknown, and instead of them we substitute the corresponding consistent estimators (2.3) and μ := w. (2.5) Then β 1 changes to β 1 , and we obtain the desired estimator It is remarkable that β 0 and β 1 are the so-called naive ALS estimators in the model (1.1), where we neglected the presence of the Berkson error u. To be precise, β 0 and β 1 are the ALS estimators for the classical model (1.3). The estimators (2.4), (2.6) use σ 2 δ but not σ 2 u . In our model, we have to estimate 5 parameters β 0 , β 1 , μ, σ 2 x , σ 2 ε . We possess already 3 estimators (2.6), (2.4) and (2.5). Moreover, we used the estimator Finally, in the model (2.1) the ALS estimator of σ 2 ε is as follows [4]: Instead of unknown K, we substitute (2.3) and get the final estimator (2.8) Though we derived the estimators under the normality assumption (iv), they remain consistent without this restriction. Proof. Here, we check the strong consistency of β 1 only. We have → denotes the convergence with probability 1 and indicates the strong consistency of the estimator.

Asymptotic variance of the estimator of slope coefficient
We need the following moment assumption.

Definition 1.
Let α and β be asymptotically normal estimators of α ∈ R p and β ∈ R q , respectively, such that with a nonsingular asymptotic covariance matrix . The estimators α and β are called asymptotically independent if can be partitioned as It is convenient to deal with asymptotically independent estimators α and β, because asymptotic confidence region for the augmented parameter (α β ) can be constructed as the Cartesian product of asymptotic confidence ellipsoids for α and β.
Proof. (a) We prove (3.13) with a nonsingular θ . 1. Since all the variances in the underlying model are assumed positive, the true vector θ is an inner point of the parameter set = R 2 × (0, ∞) × R × (0, ∞).
Since ε, x, δ, u have finite 4th moments, B is well defined. The unbiasedness of s(θ; w, y), consistency of θ and nonsingularity of V imply (3.13) by Theorem A.26 from [4], and θ can be found by the sandwich formula 2. It remains to prove that B is nonsingular. For this purpose, we have to show that the five random variables s μ = s μ (θ ; w, y), s μ y = s μ y (θ ; w, y), s σ 2 w = s σ 2 w (θ ; w, y), are linearly independent for the true value of θ . Consider a random vector (3.14) It holds (s μ , s μ y , s σ 2 w , s β 1 , s σ 2 where T = T (θ) is a nonsingular square matrix and a = a(θ) is a nonrandom vector. The matrix T is nonsingular, hence it is enough to show that neither nontrivial linear combination of the components of h is a constant. We use the centralization ρ = x − μ. Suppose that for some real numbers a 11 , a 12 , a 22 , a 1 , a 2 and a 3 , the following holds with probability one: Then a.s.
Thus, in this case (3.16) holds as well. Statement (a) of Theorem 3 is proven.
(b) Now, we rely additionally on the assumption (vi) about vanishing centered third moments. By statement (a), B is nonsingular. We have to show that it has a block-diagonal structure B = block-diag(B 1 , B 2 ) (3.17) with some matrices B 1 ∈ R 2×2 and B 2 ∈ R 3×3 , then θ will be block-diagonal as well, with nonsingular blocks: and statement (b) of Theorem 3 will be proven.

Simulation study
We simulated test data in order to evaluate the cover probability for the asymptotic confidence interval of the slope parameter, which is constructed based on Theorem 2. Observations in model (1.1) were generated as follows: x ∼ N(−1, 1), u ∼ N(0, σ 2 u ) with σ 2 u ∈ {10i : i = 1, . . . , 15}, δ ∼ N(0, 1), ε ∼ N(0, 1), β 1 = 2, β 0 = 1, with the sample size n ∈ {10i : i = 1, . . . , 10}. For each collection of model parameters, N = 10, 000 realizations were generated. For each realization, the slope estimate and the estimate of its asymptotic variance were computed (here, we inserted into (3.2) the estimates of all unknown model parameters). For an ensemble of N realizations, the cover probability was calculated for constructed 95% asymptotic confidence intervals for the slope parameter. We briefly report the obtained results. Figure 1 shows how the cover probability deviation decreases from 0.95 with increase of the sample size. This effect is stable for different values of the Berkson error variance. Figure 2 illustrates how the cover probability deviation increases from 0.95 with increase of the Berkson error variance. As can be seen in Figure 1, the latter effect is getting weaker with increase of the sample size.

Conclusion
We dealt with a linear observation model (1.1) with a mixture of the classical and Berkson errors in the covariate. Surprisingly enough, we constructed consistent estimators for the regression parameters without the knowledge of the variance of the Berkson error. Nevertheless, the size of the Berkson error makes influence on the asymptotic variances of β 0 and β 1 .  Then we modified the model to an equivalent centralized form (3.10). This made possible to divide estimators of all unknown model parameters into two asymptotically independent groups.
In future we intend to consider the prediction problem for the model (1.1), like it was done in [6] for various measurement error models. Also it would be interesting to consider a polynomial model with a mixture of the classical and Berkson errors, as well as a version of linear model with a vector response and vector covariate.

Funding
Research is supported by the National Research Fund of Ukraine grant 2020.02/0026.