Linear regression by observations from mixture with varying concentrations

We consider a finite mixture model with varying mixing probabilities. Linear regression models are assumed for observed variables with coefficients depending on the mixture component the observed subject belongs to. A modification of the least-squares estimator is proposed for estimation of the regression coefficients. Consistency and asymptotic normality of the estimates is demonstrated.


Introduction
In this paper, we discuss a structural linear regression technique in the context of model of mixture with varying concentrations (MVC). MVC means that the observed subjects belong to M different subpopulations (mixture components). The true numbers of components to which the subjects O j , j = 1, . . . , N , belong, say, κ j = κ(O j ), are unknown, but we know the probabilities (mixing probabilities or concentrations of the mixture components). MVC models arise naturally in the description of medical, biologic, and sociologic data [1,8,9,12]. They can be considered as a generalization of finite mixture models (FMM). Classical theory of FMMs can be found in monographs [10,13]. Let T be a vector of observed features (random variables) of a subject O. We consider the following linear regression model for these variables: where b (m) = (b Note. We consider a subject O as taken at random from an infinite population, so it is random in this sense. The vector of observed variables ξ(O) can be considered as a random vector even for a fixed O.
A statistical model similar to MVC with (1) is considered in [5], where a parametric model for the conditional distributions of ε(O) given κ(O) is assumed. For this case, maximum likelihood estimation is proposed in [5], and a version of EMalgorithm is developed for numerical computation of the estimates.
In this paper, we adopt a nonparametric approach assuming no parametric models for ε(O) and X(O) distributions. Nonparametric and semiparametric technique for MVC was developed in [6,7,4]. We use the weighted empirical moment technique to derive estimates for the regression coefficients and then obtain conditions of consistency and asymptotic normality of the estimates. These results are based on general ideas of least squares [11] and moment estimates [3].
The rest of the paper is organized as follows. In Section 2, we recall some results on nonparametric estimation of functional moments in general MVC. The estimates are introduced, and conditions of their consistency and asymptotic normality are presented in Section 3. Section 4 contains proofs of the statements of Section 3. Results of computer simulations are presented in Section 5.

Nonparametric estimation for MVC
Let us start with some notation and definitions. We denote by F m the distribution of ξ(O) for O belonging to the mth component of the mixture, that is, for all measurable sets A. Then by the definition of MVC In the asymptotic statements, we will consider the data Ξ n = (ξ 1 , . . . , ξ N ) as an element of (imaginary) series of data Ξ 1 , Ξ 2 , . . . , Ξ N , . . . in which no link between observations for different N is assumed. So, in formal notation, it should be more correct to write ξ j;N instead of ξ j , but we will drop the subscript N when it is insignificant.
We consider an array of all concentrations for all data sizes as an N × M -matrix. We will also consider a weight array a of the same structure as p with similar notation for its subarrays. By the angle brackets with subscript n we denote the averaging by j = 1, . . . , n: Multiplication, summation, and other operations in the angle brackets are made elementwise: Assume now that model (2) holds for the data Ξ N . Then the distribution F m of the mth component can be estimated by the weighted empirical measurê where a m j;N = It is shown in [8] that if Γ N is nonsingular, thenF m;N is the minimax unbiased estimate for F m . The consistency ofF m;N is demonstrated in [6] (see also [8]). Consider now functional moment estimation based on weighted empirical moments. Let g : R d+1 → R k be a measurable function. Then to estimatē .
This lemma is a simple corollary of Theorem 4.2 in [8]. (See also Theorem 3.1.1 in [7]).

Lemma 2 (Asymptotic normality). Assume that
2. There exists C > 0 such that det Γ N > C for all N large enough.

There exists the limit
;N .
For univariateĝ (m) ;N , the statement of the lemma is contained in Theorem 4.2 from [8] (or Theorem 3.1.2 in [7]). The multivariate case can be obtained from the univariate one applying the Cramér-Wold device (see [2], p. 382).

Estimate for b m and its asymptotics
In view of Lemma 1, we expect that, under suitable assumptions, , it is natural to suggest the argmin of J m;N (b) as an estimate for b (m) . If the weights a m were positive, then this argmin would beb In what follows, we assume that these moments and variances exist for all components. ;N is consistent if the vector p m N is asymptotically linearly independent from the vectors p i N , i = m, as N → ∞. To avoid complexities in this presentation, we do not formulate the strict meaning of this statement.
Theorem 2 (Asymptotic normality). Assume that 3. There exists C > 0 such that det Γ N > C for all N large enough.

Proofs
Proof of Theorem 1. Note that if D (m) is nonsingular, then By Lemma 1, and as N → ∞. This implies the statement of the theorem.

Proof of Theorem 2. Let us introduce a set of random vectors ξ
) T , j = 1, 2, . . . , with distributions F k that are independent for different j and k and independent from κ j . Denote δ Then the distribution of Ξ ′ N = (ξ ′ 1 , . . . , ξ ′ N ) is the same as that of Ξ N . Since in this theorem we are interested in weak convergence only, without loss of generality, let us assume that Ξ N = Ξ ′ N . By F we denote the sigma-algebra generated by ξ ;N − b (m) ) converges weakly to N (0, V). It is readily seen that Obviously, .
We will apply Lemma 2 to show that √ Nĝ .
In view of Lemma 2, to complete the proof, we only need to show thatCov(ζ N ) → Σ. Denote Thus, .
From the last equation we get Cov(ζ N ) → Σ as N → ∞.

Results of simulation
To assess the accuracy of the asymptotic results from Section 3, we performed a small simulation study. We considered a two-component mixture (M = 2) with mixing probabilities p 1 j;N = j/N and p 2 j;N = 1 − p 1 j;N . For each subject, there were two observed variables X and Y , which were simulated based on the simple linear where κ i is the number of component the jth observation belongs to, X j was simulated as N (1, 1), X The results of simulation are presented in Table 1. The true values of parameters and asymptotic covariances are placed in the last rows of the tables.
The presented data show good concordance with the asymptotic theory for n > 1000.

Conclusions
We considered a modification of least-squares estimators for linear regression coefficients in the case where observations are obtained from a mixture with varying concentrations. Conditions of consistency and asymptotic normality of the estimators were derived, and dispersion matrices were evaluated. The results of simulations confirm good concordance of estimators covariances with the asymptotic formulas for sample sizes larger then 1000 observations.
In real-life data analysis, concentrations (mixing probabilities) are usually not known exactly but estimated. So, to apply the proposed technique, we also need to analyze sensitivity of the estimates to perturbations of the concentrations model. (We are thankful to the unknown referee for this observation). It is worth noting that performance of these estimates will be poor if the true concentrations of the components are nearly linearly dependent (det Γ N ≈ 0). We also expect stability of the estimates w.r.t. concentration perturbations if det Γ N is bounded away from zero. More deep analysis of sensitivity will be a part of our further work.