Asymptotic normality of local linear regression estimator for mixtures with varying concentrations

Horbunov, Daniel; Maiboroda, Rostyslav

doi:10.15559/25-VMSTA282

Abstract

Finite mixtures with different regression models for different mixture components naturally arise in statistical analysis of biological and sociological data. In this paper a model of mixtures with varying concentrations is considered in which the mixing probabilities are different for different observations. The modified local linear regression estimator (mLLRE) is considered for nonparametric estimation of the unknown regression function for the given component of mixture. The asymptotic normality of the mLLRE is proved in the case when the regressor’s probability density function has jumps. Theoretically optimal bandwidth is derived. Simulations were made to estimate the accuracy of the normal approximation.

1 Introduction

In medical, biological and sociological studies the investigated population is frequently a mixture of subpopulations (components of the mixture) with different distributions of observed variables. If the subpopulation which a subject belongs to is not known exactly, the distribution of its variables is a mixture of subpopulations’ distributions. In the classical finite mixture models (FMM) the concentrations of the components in the mixture (mixing probabilities) are the same for all observations. See [11, 9] and [14] for results on parametric estimation under FMM. In a more flexible mixture with varying concentrations model (MVC) the concentrations are different for different observations. See [7, 8] for the theory of nonparametric estimation in these models and their application to a DNA-microchip data, and [10] for the application of MVC in the analysis of a neurological data.

Regression models are applied usually to describe dependency between different numerical variables of one subject. In the case of homogeneous sample there exist many nonparametric estimators of the regression function, such as the Nadaraya–Watson estimator (NWE) and local linear regression estimator (LLRE) [4]. A modification of NWE (mNWE) for the estimation of regression function of some MVC component is presented in [2] which also contains the derivation of asymptotic normality for mNWE.

It is well known that for homogeneous samples NWE demonstrates an inappropriate bias in points where the regressor probability density function (PDF) has discontinuity (jump points). The bias of LLRE in this case is significantly smaller [3]. A modification of LLRE for MVC (mLLRE) was considered in [5]. The consistency of mLLRE was shown in [6] and the performance of mLLRE was compared to mNWE by simulations.

In this paper we continue the study of asymptotic mLLRE behavior in jump points and points of continuity of the regressor’s PDF. It is shown that under suitable assumptions mLLRE is asymptotically normal at jump points as well as in the continuity points of the regressor distribution. This result allows to calculate the theoretically optimal bandwidth for mLLRE which minimizes the asymptotic mean squared error.

Semiparametric models similar to the one considered in this paper were discussed in [15, 12] and [13]. In these papers some versions of EM-algorithm are used to estimate the regression functions of the mixture components. Since the EM-algorithm for mixtures is based on the iteratively reweighted likelihood maximization, to construct the estimators the authors need a parametric model for the error term in the regression model. In contrast to the EM technique the approach of this paper is nonparametric both by the regression function and the distribution of the errors.

The rest of the paper is organized as follows. In Section 2 the mixture of regression models is described, in terms of which the definition of the mLLRE is recalled. Section 3 contains the main result on asymptotic normality of the estimator. In Section 4 an optimal bandwith parameter selection for the mLLRE is discussed. The proof of the main result is presented in Section 5. Simulations for the mLLRE are provided in Section 6. Conclusive remarks are placed in Section 7.

2 Mixture of regressions and the locally linear estimator

2.1 Mixture of regressions

Consider a sample with n subjects ${\{{O_{j}}\}_{j=1}^{n}}$. Each subject ${O_{j}}$ belongs to one of the M subpopulations (components of the mixture). For each $j=\overline{1,n}$ the component which contains ${O_{j}}$ is unknown. The numerical index of the containing component is denoted ${\kappa _{j}}=\kappa ({O_{j}})$, $1\le {\kappa _{j}}\le M$; it is a latent (unobesrved) random variable, yet the distributions of ${\kappa _{j}}$ are assumed to be known. The probabilities

(1)

\[ {p_{j:n}^{(k)}}=\mathbf{P}\left({\kappa _{j}}=k\right),\hspace{1em}j=\overline{1,n},\hspace{0.2222em}k=\overline{1,M}\]

are called concentrations of components of the mixture or mixing probabilities.

For each subject ${O_{j}}$ one observes a bivariate vector of numerical variables ${\xi _{j}}=({X_{j}},{Y_{j}})$ where ${X_{j}}=X({O_{j}})$ and ${Y_{j}}=Y({O_{j}})$ are the regressor and response respectively. The distribution of these variables is described by the regression model:

\[ {Y_{j}}={g^{({\kappa _{j}})}}({X_{j}})+{\varepsilon _{j}},\hspace{1em}j=\overline{1,n},\]

where ${g^{(k)}}$ is an unknown regression function for k-th component of mixture, ${\varepsilon _{j}}=\varepsilon ({O_{j}})$ is a random error term. It is assumed that the vectors ${\{({X_{j}},{Y_{j}})\}_{j=1}^{n}}$ are mutually independent for any fixed $n\ge 1$, and for all $j=\overline{1,n}$, ${X_{j}}$ and ${\varepsilon _{j}}$ are conditionally independent under the condition $\{{\kappa _{j}}=m\}$, $m=\overline{1,M}$.

For all $k=\overline{1,M}$ the conditional distribution of ${X_{j}}\mid \{{\kappa _{j}}=k\}$ has a Lebesgue density ${f^{(k)}}$, which does not depend of j. We assume that the distributions of errors ${\varepsilon _{j}}$ satisfy the following conditions:

1. $\mathbf{E}\left[{\varepsilon _{j}}\mid {\kappa _{j}}=k\right]=0$,
2. $\mathbf{Var}\left[{\varepsilon _{j}}\mid {\kappa _{j}}=k\right]={\sigma _{(k)}^{2}}\lt \infty $.

2.2 Minimax weights

In this paper we consider a modified locally linear estimator for ${g^{(m)}}$ at a fixed point ${x_{0}}\in \mathbb{R}$ introduced in [5]. This estimator utilizes minimax weights for the estimation of component distributions (see [7]). Let us recall the construction of these weights.

In what follows the angle brackets mean averaging of a vector:

\[ {\left\langle \mathbf{v}\right\rangle _{n}}=\frac{1}{n}{\sum \limits_{j=1}^{n}}{v_{j}},\hspace{1em}\text{for any}\hspace{2.5pt}\mathbf{v}={({v_{1}},\dots ,{v_{n}})^{T}}\in {\mathbb{R}^{n}}.\]

Arithmetic operations with vectors in the angle brackets are performed entry-wise:

\[ {\left\langle \mathbf{v}\mathbf{u}\right\rangle _{n}}=\frac{1}{n}{\sum \limits_{j=1}^{n}}{v_{j}}{u_{j}}.\]

Consider a set of concentration vectors ${\mathbf{p}^{(m)}}={({p_{1:n}^{(m)}},\dots ,{p_{n:n}^{(m)}})^{T}}$, $m=\overline{1,M}$. Observe that ${\left\langle {\mathbf{p}^{(m)}}{\mathbf{p}^{(k)}}\right\rangle _{n}}$ can be considered as an inner product on ${\mathbb{R}^{n}}$. Assuming that the concentration vectors ${\{{\mathbf{p}^{(m)}}\}_{m=1}^{M}}$ are linearly independent, the Gram matrix ${\Gamma _{n}}={\left({\left\langle {\mathbf{p}^{(k)}}{\mathbf{p}^{(l)}}\right\rangle _{n}}\right)_{k,l=1}^{M}}$ is invertible. The weighting coefficients ${a_{j:n}^{(m)}}$ defined by the formula

(2)

\[ {a_{j:n}^{(m)}}=\frac{1}{\det {\Gamma _{n}}}{\sum \limits_{m=1}^{M}}{(-1)^{m+k}}{\gamma _{km}}{p_{j:n}^{(m)}},\]

where ${\gamma _{km}}$ is the $(k,m)$-th minor of ${\Gamma _{n}}$, are called minimax weighting coefficients. These weights can also be obtained by the formula

\[ ({a_{j:n}^{(1)}},\dots ,{a_{j:n}^{(M)}})=({p_{j:n}^{(1)}},\dots ,{p_{j:n}^{(M)}}){\Gamma _{n}^{-1}}.\]

The vector of minimax coefficients for the m-th component will be denoted by ${\mathbf{a}^{(m)}}={({a_{1:n}^{(m)}},\dots ,{a_{n:n}^{(m)}})^{T}}$. Observe that

(3)

\[ {\left\langle {\mathbf{p}^{(k)}}{\mathbf{a}^{(m)}}\right\rangle _{n}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,\hspace{1em}& k=m,\\ {} 0,\hspace{1em}& k\ne m,\end{array}\right.\hspace{1em}\hspace{2.5pt}\text{for all}\hspace{2.5pt}m=\overline{1,M}.\]

2.3 Construction of an estimator

The modified local linear regression estimator (mLLRE) for ${g^{(m)}}({x_{0}})$ was introduced in [5] as a generalization of local linear regression to the data described by the model of regression mixture (1). To define it one needs to choose a kernel function $K:\mathbb{R}\to {\mathbb{R}_{+}}$ and a bandwidth $h\gt 0$. For any $p,q\in {\mathbb{Z}_{+}}$ let

\[ {S_{p,q:n}^{(m)}}=\frac{1}{nh}{\sum \limits_{j=1}^{n}}{a_{j:n}^{(m)}}K\left(\frac{{x_{0}}-{X_{j}}}{h}\right){\left(\frac{{x_{0}}-{X_{j}}}{h}\right)^{p}}{Y_{j}^{q}}.\]

Then the mLLRE can be defined as

(4)

\[ {\hat{g}_{n}^{(m)}}({x_{0}})=\frac{{S_{2,0:n}^{(m)}}{S_{0,1:n}^{(m)}}-{S_{1,1:n}^{(m)}}{S_{1,0:n}^{(m)}}}{{S_{2,0:n}^{(m)}}{S_{0,0:n}^{(m)}}-{({S_{1,0:n}^{(m)}})^{2}}}.\]

3 Asymptotic normality of an estimator

To formulate the result on the asymptotic normality of mLLRE we need some notations and definitions.

The symbol $\stackrel{\text{W}}{\longrightarrow }$ means weak convergence.

In what follows, the one-sided limits of a function $f(x)$ at a point ${x_{0}}$ are denoted by

\[ f({x_{0}}-)=\underset{x\to {x_{0}}-0}{\lim }f(x),\hspace{1em}f({x_{0}}+)=\underset{x\to {x_{0}}+0}{\lim }f(x),\]

assuming that these limits exist. With this notation, we define

\[\begin{aligned}{}{I_{d}^{(k),-}}& ={f^{(k)}}({x_{0}}+){\int _{-\infty }^{0}}{z^{d}}{(K(z))^{2}}dz,\hspace{0.2222em}{I_{d}^{(k),+}}={f^{(k)}}({x_{0}}-){\int _{0}^{+\infty }}{z^{d}}{(K(z))^{2}}dz.\\ {} {I_{d}^{(k)}}& ={I_{d}^{(k),+}}+{I_{d}^{(k),-}},\\ {} {I_{{d_{x}},{d_{y}}}^{(k)}}& =\left\{\begin{array}{l@{\hskip10.0pt}l}{I_{{d_{x}}}^{(k)}},\hspace{1em}& {d_{y}}=0,\\ {} ({I_{{d_{x}}}^{(k),+}}{g^{(k)}}({x_{0}}-)+{I_{{d_{x}}}^{(k),-}}{g^{(k)}}({x_{0}}+)),\hspace{1em}& {d_{y}}=1,\\ {} ({I_{{d_{x}}}^{(k),+}}({({g^{(k)}}({x_{0}}-))^{2}}+{\sigma _{(k)}^{2}})+{I_{{d_{x}}}^{(k),-}}({({g^{(k)}}({x_{0}}+))^{2}}+{\sigma _{(k)}^{2}})),\hspace{1em}& {d_{y}}=2,\end{array}\right.\\ {} {\Sigma _{{d_{x}}:{d_{y}}}^{(m)}}& ={\sum \limits_{k=1}^{M}}\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle {I_{{d_{x}},{d_{y}}}^{(k)}},\end{aligned}\]

\[ {\Sigma ^{(m)}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\Sigma _{0:0}^{(m)}}& {\Sigma _{0:1}^{(m)}}& {\Sigma _{1:0}^{(m)}}& {\Sigma _{1:1}^{(m)}}& {\Sigma _{2:0}^{(m)}}\\ {} {\Sigma _{0:1}^{(m)}}& {\Sigma _{0:2}^{(m)}}& {\Sigma _{1:1}^{(m)}}& {\Sigma _{1:2}^{(m)}}& {\Sigma _{2:1}^{(m)}}\\ {} {\Sigma _{1:0}^{(m)}}& {\Sigma _{1:1}^{(m)}}& {\Sigma _{2:0}^{(m)}}& {\Sigma _{2:1}^{(m)}}& {\Sigma _{3:0}^{(m)}}\\ {} {\Sigma _{1:1}^{(m)}}& {\Sigma _{1:2}^{(m)}}& {\Sigma _{2:1}^{(m)}}& {\Sigma _{2:2}^{(m)}}& {\Sigma _{3:1}^{(m)}}\\ {} {\Sigma _{2:0}^{(m)}}& {\Sigma _{2:1}^{(m)}}& {\Sigma _{3:0}^{(m)}}& {\Sigma _{3:1}^{(m)}}& {\Sigma _{4:0}^{(m)}}\end{array}\right),\]

\[\begin{aligned}{}{u_{p}^{-}}& ={\underset{-\infty }{\overset{0}{\int }}}{z^{p}}K(z)dz,\hspace{1em}{u_{p}^{+}}={\underset{0}{\overset{+\infty }{\int }}}{z^{p}}K(z)dz,\\ {} {u_{p}}& ={u_{p}^{-}}+{u_{p}^{+}},\\ {} {e_{{p_{x}},{p_{y}}}^{(m)}}& ={({g^{(m)}}({x_{0}}))^{{p_{y}}}}\cdot ({f^{(m)}}({x_{0}}-){u_{{p_{x}}}^{+}}+{f^{(m)}}({x_{0}}+){u_{{p_{x}}}^{-}}),\hspace{1em}{p_{y}}\in \{0,1\}.\end{aligned}\]

Now we are ready to formulate our main result on the asymptotic behavior of mLLRE.

Theorem 1.

Assume that the following conditions hold.

1. For all $k=\overline{1,M}$, there exist ${f^{(k)}}({x_{0}}\pm )$, ${g^{(k)}}({x_{0}}\pm )$.
2. ${g^{(m)}}$ is twice continuously differentiable in some neighbourhood B of ${x_{0}}$.
3. For all $k,{k_{1}},{k_{2}}=\overline{1,M}$ the limits
\[\begin{aligned}{}\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle & :=\underset{n\to +\infty }{\lim }{\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle _{n}},\\ {} \left\langle {\mathbf{a}^{(m)}}{\mathbf{p}^{({k_{1}})}}{\mathbf{p}^{({k_{2}})}}\right\rangle & :=\underset{n\to \infty }{\lim }{\left\langle {\mathbf{a}^{(m)}}{\mathbf{p}^{({k_{1}})}}{\mathbf{p}^{({k_{2}})}}\right\rangle _{n}}\end{aligned}\]
exist and are finite.
4. There exists $\underset{n\to \infty }{\lim }{\Gamma _{n}}=\Gamma $, where ${\Gamma _{n}}={({\left\langle {\mathbf{p}^{({k_{1}})}}{\mathbf{p}^{({k_{2}})}}\right\rangle _{n}})_{{k_{1}},{k_{2}}=1}^{M}}$.
5. $h={h_{n}}=H{n^{-1/5}}$.
6. For some $A\gt 0$ and all z such that $|z|\gt A$, $K(z)=0$.
7. Integrals ${\textstyle\int _{-\infty }^{\infty }}|z|K(z)dz$ and ${\textstyle\int _{-\infty }^{\infty }}{z^{4}}{(K(z))^{2}}dz$ are finite.
8. ${f^{(k)}}(x)$, ${g^{(k)}}(x)$ are bounded for $x\in B$ for all $k=\overline{1,M}$.
9. ${e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}}\ne 0$.
10. $\mathbf{E}\left[{\varepsilon _{j}^{4}}\mid {\kappa _{j}}=k\right]\lt \infty $ for all $k=\overline{1,M}$.

Then

(5)

\[ {n^{2/5}}({\hat{g}_{n}^{(m)}}({x_{0}})-{g^{(m)}}({x_{0}}))\stackrel{\text{W}}{\longrightarrow }N({\mu ^{(m)}}({x_{0}}),{S_{(m)}^{2}}({x_{0}})),\]

where ${\mu ^{(m)}}({x_{0}})$ and ${S_{(m)}^{2}}({x_{0}})$ are defined by

\[\begin{aligned}{}{\mu ^{(m)}}({x_{0}})& ={H^{2}}\cdot \frac{{\ddot{g}^{(m)}}({x_{0}})}{2}\cdot \frac{{({e_{2,0}^{(m)}})^{2}}-{e_{1,0}^{(m)}}{e_{3,0}^{(m)}}}{{e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}}},\\ {} {S_{(m)}^{2}}({x_{0}})& =\frac{1}{H}({({g^{(m)}}({x_{0}}))^{2}}{\tilde{\Sigma }_{0}^{(m)}}-2({g^{(m)}}({x_{0}})){\tilde{\Sigma }_{1}^{(m)}}+{\tilde{\Sigma }_{2}^{(m)}}),\\ {} {\tilde{\Sigma }_{k}^{(m)}}& ={\tilde{e}_{2,2}^{(m)}}{\Sigma _{0:k}^{(m)}}-2{\tilde{e}_{1,2}^{(m)}}{\Sigma _{1:k}^{(m)}}+{\tilde{e}_{1,1}^{(m)}}{\Sigma _{2:k}^{(m)}},\\ {} {\tilde{e}_{p,q}^{(m)}}& =\frac{{e_{p,0}^{(m)}}{e_{q,0}^{(m)}}}{{({e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}})^{2}}},\end{aligned}\]

where

\[ {\ddot{g}^{(m)}}(x)=\frac{{d^{2}}{g^{(m)}}(x)}{d{x^{2}}}.\]

4 Optimal bandwidth selection

The mLLRE ${\hat{g}_{n}^{(m)}}({x_{0}})$ defined by (4) depends on the bandwidth h, a tuning parameter that must be selected by the researcher to obtain an accurate estimator. The accuracy of ${\hat{g}_{n}^{(m)}}({x_{0}})$ usually is measured by the mean squared error

\[ \text{MSE}({\hat{g}_{n}^{(m)}}({x_{0}}))=\mathbf{E}[{({\hat{g}_{n}^{(m)}}({x_{0}})-{g^{(m)}}({x_{0}}))^{2}}]=\mathbf{Var}[{\hat{g}_{n}^{(m)}}({x_{0}})]+{(\mathbf{bias}({\hat{g}_{n}^{(m)}}({x_{0}})))^{2}},\]

where $\mathbf{bias}({\hat{g}_{n}^{(m)}}({x_{0}}))=\mathbf{E}[{\hat{g}_{n}^{(m)}}({x_{0}})]-{g^{(m)}}({x_{0}})$ is the estimator’s bias. In Theorem 1 we considered the choice $h={h_{n}}=H{n^{-1/5}}$, where H is some fixed constant. This rate of convergence for the bandwidth as $n\to \infty $ is optimal, since if ${h_{n}}$ vanishes slower, the estimator has inappropriately high bias, while for more rapid ${h_{n}}$ decay the variance of the estimator would be inappropriate. So, we need to choose the best constant H. By Theorem 1, ${n^{2/5}}({\hat{g}_{n}^{(m)}}({x_{0}})-{g^{(m)}}({x_{0}}))$ converges weakly to $\eta \sim N({\mu ^{(m)}}({x_{0}}),{S_{(m)}^{2}}({x_{0}}))$, so we will measure the asymptotic accuracy of ${\hat{g}_{n}^{(m)}}({x_{0}})$ by the asymptotic MSE (aMSE):

\[\begin{aligned}{}\text{aMSE}(H)& =\mathbf{E}[{\eta ^{2}}]={\left({\mu ^{(m)}}({x_{0}})\right)^{2}}+{S_{(m)}^{2}}({x_{0}})={H^{4}}\cdot {E_{(m)}^{2}}+\frac{1}{H}\cdot {V_{(m)}},\end{aligned}\]

where

\[\begin{aligned}{}{E_{(m)}}& =\frac{{\ddot{g}^{(m)}}({x_{0}})}{2}\cdot \frac{{({e_{2,0}^{(m)}})^{2}}-{e_{1,0}^{(m)}}{e_{3,0}^{(m)}}}{{e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}}},\\ {} {V_{(m)}}& ={({g^{(m)}}({x_{0}}))^{2}}{\tilde{\Sigma }_{0}^{(m)}}-2({g^{(m)}}({x_{0}})){\tilde{\Sigma }_{1}^{(m)}}+{\tilde{\Sigma }_{2}^{(m)}}.\end{aligned}\]

An optimal bandwidth constant, which minimizes aMSE, is

\[ {H_{\ast }^{(m)}}={\left(\frac{{V_{(m)}}}{4{E_{(m)}^{2}}}\right)^{1/5}}.\]

Observe that ${H_{\ast }^{(m)}}$ cannot be calculated by the data, since it depends on unknown distributions of the mixture components. So it is an infeasible theoretically optimal bandwidth constant which can be used in comparisons to some empirical bandwidth selection rules.

5 Proofs

The proof of Theorem 1 is based on two lemmas.

Let

(6)

\[\begin{array}{l}\displaystyle {\mathbf{S}_{n}^{(m)}}={({S_{0,0}^{(m)}},{S_{0,1}^{(m)}},{S_{0,2}^{(m)}},{S_{1,0}^{(m)}},{S_{2,0}^{(m)}})^{T}},\hspace{3.33333pt}{\mathbf{e}_{n}^{m}}=\mathbf{E}[{\mathbf{S}_{n}^{(m)}}].\\ {} \displaystyle {\Delta _{n}^{(m)}}=\sqrt{nh}({\mathbf{S}_{n}^{(m)}}-{\mathbf{e}_{n}^{(m)}}).\end{array}\]

Lemma 1.

Under Assumptions 1–4 and 6–10 of Theorem 1, if $h={h_{n}}\to 0$ and $n{h_{n}}\to \infty $ as $n\to \infty $, then

(7)

\[ {\Delta _{n}^{(m)}}\stackrel{\text{W}}{\longrightarrow }{\Delta _{\infty }^{(m)}}\sim N(\mathbf{0},{\Sigma ^{(m)}}).\]

For any $\mathbf{a}={({a_{0,0}},{a_{0,1}},{a_{1,0}},{a_{1,1}},{a_{2,0}})^{T}}\in {\mathbb{R}^{5}}$ let

(8)

\[ \mathbf{U}(\mathbf{a})=\frac{{a_{2,0}}{a_{0,1}}-{a_{1,1}}{a_{1,0}}}{{a_{2,0}}{a_{0,0}}-{a_{1,0}^{2}}},\]

Lemma 2.

Under the assumptions of Theorem 1

\[ \mathbf{U}({\textbf{\textit{e}}_{n}^{(m)}})-{g^{(m)}}({x_{0}})=\frac{{h^{2}}}{2}\cdot {\ddot{g}^{(m)}}({x_{0}})\cdot \frac{{({e_{2,0}^{(m)}})^{2}}-{e_{1,0}^{(m)}}{e_{3,0}^{(m)}}}{{e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}}}+o({h^{2}}),\hspace{1em}n\to \infty ,\]

Proof of Theorem 1.

Consider

(9)

\[ {n^{2/5}}({\hat{g}^{(m)}}({x_{0}})-{g^{(m)}}({x_{0}}))={n^{2/5}}(\mathbf{U}({\mathbf{S}_{n}^{(m)}})-\mathbf{U}({\mathbf{e}_{n}^{(m)}}))+{n^{2/5}}(\mathbf{U}({\mathbf{e}_{n}^{(m)}})-{g^{(m)}}({x_{0}})).\]

Lemma 1 and the continuous mapping theorem (see Theorem 3.1 in [1]) yield

\[ {n^{2/5}}(\mathbf{U}({\mathbf{S}_{n}^{(m)}})-\mathbf{U}({\mathbf{e}_{n}^{(m)}}))\stackrel{\text{W}}{\longrightarrow }\frac{1}{\sqrt{H}}{\dot{\mathbf{U}}^{T}}({\mathbf{e}_{n}^{(m)}}){\Delta _{\infty }^{(m)}},\]

where

\[ \dot{\mathbf{U}}(\mathbf{a})={\left(\frac{d}{d{a_{0,0}}}\mathbf{U}(\mathbf{a}),\frac{d}{d{a_{0,1}}}\mathbf{U}(\mathbf{a}),\frac{d}{d{a_{1,0}}}\mathbf{U}(\mathbf{a}),\frac{d}{d{a_{1,1}}}\mathbf{U}(\mathbf{a}),\frac{d}{d{a_{2,0}}}\mathbf{U}(\mathbf{a})\right)^{T}}\]

is the gradient of U. Tedious but straightforward algebra yields

(10)

\[ \mathbf{Var}\left[\frac{1}{\sqrt{H}}{\dot{\mathbf{U}}^{T}}({\mathbf{e}_{n}^{(m)}}){\Delta _{\infty }^{(m)}}\right]=\frac{1}{H}\dot{\mathbf{U}}{({\mathbf{e}_{n}^{(m)}})^{T}}{\Sigma ^{(m)}}\dot{\mathbf{U}}({\mathbf{e}_{n}^{(m)}})={S_{(m)}^{2}}({x_{0}}).\]

By Lemma 2 for $h=H{n^{-1/5}}$,

(11)

\[ {n^{2/5}}(\mathbf{U}({\mathbf{e}_{n}^{(m)}})-{g^{(m)}}({x_{0}}))\to \frac{{H^{2}}}{2}\cdot {\ddot{g}^{(m)}}({x_{0}})\cdot \frac{{({e_{2,0}^{(m)}})^{2}}-{e_{1,0}^{(m)}}{e_{3,0}^{(m)}}}{{e_{2,0}^{(m)}}{e_{0,0}^{(m)}}-{({e_{1,0}^{(m)}})^{2}}}.\]

Combining (9)–(11) one obtains the statement of Theorem 1. □

To demonstrate Lemma 1 we need the Lindeberg–Feller central limit theorem.

Lemma 3.

(Lindeberg’s CLT) Let ${\{{\eta _{j:n}}\}_{j=1}^{n}}$, $n=1,2,\dots \hspace{0.1667em}$, be a set of random vectors in ${\mathbb{R}^{d}}$, satisfying the following assumptions.

1. For any fixed $n\ge 1$, vectors ${\{{\eta _{j:n}}\}_{j=1}^{n}}$ are mutually independent.
2. For all $j=\overline{1,n}$, $n\ge 1$, one has $\mathbf{E}[{\eta _{j:n}}]=\mathbf{0}$.
3. If ${\sigma _{j:n}^{2}}=\operatorname{\mathbf{Cov}}({\eta _{j:n}})$, then for ${\sigma _{n}^{2}}={\textstyle\sum _{j=1}^{n}}{\sigma _{j:n}^{2}}$ there exists
\[ {\sigma ^{2}}=\underset{n\to \infty }{\lim }{\sigma _{n}^{2}},\]
morover ${\sigma ^{2}}$ is a positive semidefinite matrix.
4. For some $s\gt 2$ the following convergence holds:
\[ {M_{2}}(s)={\sum \limits_{j=1}^{n}}\mathbf{E}[\min (|{\eta _{j:n}}{|^{2}},|{\eta _{j:n}}{|^{s}})]\to 0.\]

Then

\[ {\sum \limits_{j=1}^{n}}{\eta _{j:n}}\stackrel{\text{W}}{\longrightarrow }N(\mathbf{0},{\sigma ^{2}}).\]

For the proof, see [1], Theorem 8.4.1.

Proof of Lemma 1.

To simplify notations, we introduce formally random vectors $({X_{(m)}},{Y_{(m)}},{\varepsilon _{(m)}})$ with the distribution of $({X_{j}},{Y_{j}},{\varepsilon _{j}})$ given ${\kappa _{j}}=m$.

Conditions of Lemma 3 will be verified for $\{{\eta _{j:n}^{(m)}}\}$, where

\[\begin{aligned}{}{\eta _{j:n}^{(m)}}& ={a_{j:n}^{(m)}}\cdot {\tilde{\eta }^{\prime }_{j:n}},\hspace{0.2222em}{\tilde{\eta }^{\prime }_{j:n}}=({\tilde{\eta }_{j:n}}-\mathbf{E}[{\tilde{\eta }_{j:n}}]),\\ {} {\tilde{\eta }_{j:n}}& =\frac{1}{\sqrt{nh}}\cdot K\left(\frac{{x_{0}}-{X_{j}}}{h}\right){\left(1,\hspace{0.2222em}{Y_{j}},\hspace{0.2222em}\left(\frac{{x_{0}}-{X_{j}}}{h}\right),\hspace{0.2222em}\left(\frac{{x_{0}}-{X_{j}}}{h}\right){Y_{j}},\hspace{0.2222em}{\left(\frac{{x_{0}}-{X_{j}}}{h}\right)^{2}}\right)^{T}}\hspace{-0.1667em}\hspace{-0.1667em}.\end{aligned}\]

Similarly we define random variables ${\tilde{\eta }_{(m):n}}$ (${\tilde{\eta }^{\prime }_{(m):n}}$) that have a distribution of ${\tilde{\eta }_{j:n}}$ (${\tilde{\eta }^{\prime }_{j:n}}$) given ${\kappa _{j}}=m$. Obviously, ${\Delta _{n}^{(m)}}={\textstyle\sum _{j=1}^{n}}{\eta _{j:n}^{(m)}}$.

The first condition of Lemma 3 holds since $({X_{j}},{Y_{j}})$ are independent for different j. The second condition follows from the construction of ${\eta _{j:n}^{(m)}}$.

From now, we will proceed to the third condition of Lemma 3. For some ${p_{x}},{p_{y}},{q_{x}},{q_{y}}$, consider

\[ \operatorname{\mathbf{Cov}}({S_{{p_{x}},{p_{y}}}^{(m)}},{S_{{q_{x}},{q_{y}}}^{(m)}})={\Sigma _{{p_{x}},{p_{y}}:{q_{x}},{q_{y}}}^{(m)}}(n)={Q_{1}^{(m)}}(n,h)-{Q_{2}^{(m)}}(n,h),\]

\[\begin{aligned}{}{Q_{1}^{(m)}}(n,h)& =\frac{1}{nh}{\sum \limits_{j=1}^{n}}{({a_{j:n}^{(m)}})^{2}}\mathbf{E}\left[{\left(K\left(\frac{{x_{0}}-{X_{j}}}{h}\right)\right)^{2}}{\left(\frac{{x_{0}}-{X_{j}}}{h}\right)^{{p_{x}}+{q_{x}}}}{Y_{j}^{{p_{y}}+{q_{y}}}}\right],\\ {} {Q_{2}^{(m)}}(n,h)& =\frac{1}{nh}{\sum \limits_{j=1}^{n}}{({a_{j:n}^{(m)}})^{2}}\mathbf{E}\left[K\left(\frac{{x_{0}}-{X_{j}}}{h}\right){\left(\frac{{x_{0}}-{X_{j}}}{h}\right)^{{p_{x}}}}{Y_{j}^{{p_{y}}}}\right]\\ {} & \hspace{1em}\times \mathbf{E}\left[K\left(\frac{{x_{0}}-{X_{j}}}{h}\right){\left(\frac{{x_{0}}-{X_{j}}}{h}\right)^{{q_{x}}}}{Y_{j}^{{q_{y}}}}\right].\end{aligned}\]

We will investigate ${Q_{1}^{(m)}}(n,h)$ and ${Q_{2}^{(m)}}(n,h)$ separately. First of all, note that

\[\begin{aligned}{}& {Q_{1}^{(m)}}(n,h)\\ {} & \hspace{1em}=\frac{1}{h}{\sum \limits_{k=1}^{M}}{\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle _{n}}\mathbf{E}\left[{\left(K\left(\frac{{x_{0}}-{X_{(k)}}}{h}\right)\right)^{2}}{\left(\frac{{x_{0}}-{X_{(k)}}}{h}\right)^{{p_{x}}+{q_{x}}}}{Y_{(k)}^{{p_{y}}+{q_{y}}}}\right].\end{aligned}\]

Consider the expectations in the sum and denote ${d_{x}}={p_{x}}+{q_{x}}$ and ${d_{y}}={p_{y}}+{q_{y}}\le 4$. Then, for all $k=\overline{1,M}$,

\[\begin{aligned}{}& \frac{1}{h}\mathbf{E}\left[{\left(K\left(\frac{{x_{0}}-{X_{(k)}}}{h}\right)\right)^{2}}{\left(\frac{{x_{0}}-{X_{(k)}}}{h}\right)^{{p_{x}}+{q_{x}}}}{Y_{(k)}^{{p_{y}}+{q_{y}}}}\right]\\ {} & \hspace{1em}={\sum \limits_{l=0}^{{d_{y}}}}\left(\genfrac{}{}{0.0pt}{}{{d_{y}}}{l}\right)\cdot \mathbf{E}\left[{\varepsilon _{(k)}^{{d_{y}}-l}}\right]\cdot {\underset{-\infty }{\overset{+\infty }{\int }}}{(K(z))^{2}}{z^{{d_{x}}}}{({g^{(k)}}({x_{0}}-hz))^{l}}{f^{(k)}}({x_{0}}-hz)dz,\end{aligned}\]

where $\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)=n!/(k!(n-k)!)$ is the binomial coefficient. By Assumptions 1, 6 and 7 we obtain

\[\begin{aligned}{}& {\underset{-\infty }{\overset{+\infty }{\int }}}{(K(z))^{2}}{z^{{d_{x}}}}{({g^{(k)}}({x_{0}}-hz))^{l}}{f^{(k)}}({x_{0}}-hz)dz\\ {} & \hspace{1em}\to {({g^{(k)}}({x_{0}}-))^{l}}{I_{{d_{x}}}^{(k),+}}+{({g^{(k)}}({x_{0}}+))^{l}}{I_{{d_{x}}}^{(k),-}}\end{aligned}\]

as $n\to \infty $. So, for ${d_{x}}\in \{0,1,2,3,4\}$ and ${d_{y}}\in \{0,1,2\}$,

From the assumption $\mathbf{E}[{\varepsilon _{(k)}}]=0$ and Assumption 4, we obtain

\[ {Q_{1}^{(m)}}(n,h)\to {\sum \limits_{k=1}^{M}}\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle {I_{{d_{x}},{d_{y}}}^{(k)}},\hspace{1em}n\to \infty .\]

Now we will show that ${Q_{2}^{(m)}}(n,h)\to 0$ as $n\to \infty $. Note that

\[\begin{aligned}{}{Q_{2}^{(m)}}(n,h)& =h{\sum \limits_{{k_{1}},{k_{2}}=1}^{M}}{\left\langle {({\mathbf{a}^{(m)}})^{2}}{p^{({k_{1}})}}{p^{({k_{2}})}}\right\rangle _{n}}{Q_{{p_{x}},{p_{y}}}^{({k_{1}})}}(n,h){Q_{{q_{x}},{q_{y}}}^{({k_{2}})}}(n,h),\\ {} {Q_{{p_{x}},{p_{y}}}^{(k)}}(n,h)& ={\underset{-\infty }{\overset{+\infty }{\int }}}{\underset{-\infty }{\overset{+\infty }{\int }}}K(z){z^{{p_{x}}}}{({g^{(k)}}({x_{0}}-hz)+u)^{{p_{y}}}}{f^{(k)}}({x_{0}}-hz)dzd{F_{\varepsilon }^{(k)}}(u),\end{aligned}\]

where ${F_{\varepsilon }^{(k)}}(u)=\mathbf{P}\left({\varepsilon _{(k)}}\lt u\right)$ is the cumulative distribution function of ${\varepsilon _{(k)}}$. The multiple integrals ${Q_{{p_{x}},{p_{y}}}^{(k)}}(n,h)$ are bounded for $n\ge 1$, thus from Assumption 3 we obtain ${Q_{2}^{(m)}}(n,h)\to 0$ as $n\to \infty $.

Combining the asymptotics of ${Q_{1}^{(m)}}(n,h)$ and ${Q_{2}^{(m)}}(n,h)$ as $n\to \infty $, we obtain the asymptotics of covariances for ${\Delta _{n}^{(m)}}$:

\[ {\Sigma _{{p_{x}},{p_{y}}:{q_{x}},{q_{y}}}^{(m)}}(n)\to {\sum \limits_{k=1}^{M}}\left\langle {({\mathbf{a}^{(m)}})^{2}}{\mathbf{p}^{(k)}}\right\rangle {I_{{p_{x}}+{q_{x}},{p_{y}}+{q_{y}}}^{(k)}}={\Sigma _{{p_{x}}+{q_{x}}:{p_{y}}+{q_{y}}}^{(m)}},\hspace{1em}n\to \infty .\]

The third condition of Lemma 3 holds.

Finally we will show that the fourth condition of Lemma 3 holds. For some $s\gt 2$, note that

\[\begin{aligned}{}{M_{2}}(s)& ={\sum \limits_{j=1}^{n}}\mathbf{E}\left[\min (|{\eta _{j:n}}{|^{2}},|{\eta _{j:n}}{|^{s}})\right]\\ {} & ={\sum \limits_{j=1}^{n}}{\sum \limits_{k=1}^{M}}{p_{j:n}^{(k)}}\mathbf{E}\left[\min (|{a_{j:n}^{(m)}}{|^{2}}\cdot |{\tilde{\eta }^{\prime }_{(k):n}}{|^{2}},|{a_{j:n}^{(m)}}{|^{s}}\cdot |{\tilde{\eta }^{\prime }_{(k):n}}{|^{s}})\right]\\ {} & \le {C_{\Gamma }}\cdot {\sum \limits_{j=1}^{n}}{\sum \limits_{k=1}^{M}}\mathbf{E}\left[\min (|{\tilde{\eta }^{\prime }_{(k):n}}{|^{2}},|{\tilde{\eta }^{\prime }_{(k):n}}{|^{s}})\right],\end{aligned}\]

since ${p_{j:n}^{(m)}}\le 1$, $|{a_{j:n}^{(m)}}{|^{2}}\le \max (1,{\sup _{j=\overline{1,n}}}|{a_{j:n}^{(m)}}{|^{s}})={C_{\Gamma }}\lt \infty $. By the inequality

(12)

\[ |\mathbf{a}+\mathbf{b}{|^{s}}\le {2^{s-1}}(|\mathbf{a}{|^{s}}+|\mathbf{b}{|^{s}}),\hspace{1em}\text{for any}\hspace{2.5pt}\mathbf{a},\mathbf{b}\in {\mathbb{R}^{d}},\]

we obtain

\[\begin{aligned}{}& {\sum \limits_{j=1}^{n}}{\sum \limits_{k=1}^{M}}\mathbf{E}\left[\min (|{\tilde{\eta }^{\prime }_{(k):n}}{|^{2}},|{\tilde{\eta }^{\prime }_{(k):n}}{|^{s}})\right]\\ {} & \le {2^{s-1}}\cdot {\sum \limits_{k=1}^{M}}n\cdot \left(\mathbf{E}\left[\min \left(|{\tilde{\eta }_{(k):n}}{|^{2}},|{\tilde{\eta }_{(k):n}}{|^{s}}\right)\right]+\max \left(|\mathbf{E}\left[{\tilde{\eta }_{(k):n}}\right]{|^{2}},|\mathbf{E}\left[{\tilde{\eta }_{(k):n}}\right]{|^{s}}\right)\right).\end{aligned}\]

We will show that, as $n\to \infty $,

(13)

\[\begin{aligned}{}& n\cdot \mathbf{E}\left[\min \left(|{\tilde{\eta }_{(k):n}}{|^{2}},|{\tilde{\eta }_{(k):n}}{|^{s}}\right)\right]\to 0,\end{aligned}\]

(14)

\[\begin{aligned}{}& n\cdot \max \left(|\mathbf{E}[{\tilde{\eta }_{(k):n}}]{|^{2}},|\mathbf{E}[{\tilde{\eta }_{(k):n}}]{|^{s}}\right)\to 0.\end{aligned}\]

Let us show (14). Observe that for any $p\ge 2$

(15)

\[ n\cdot |\mathbf{E}[{\tilde{\eta }_{(k):n}}]{|^{p}}\to 0,\hspace{1em}n\to \infty .\]

Really, the left-hand side of (15) can be expressed as follows:

\[ n\cdot |\mathbf{E}[{\tilde{\eta }_{(k):n}}]{|^{p}}=n\cdot {\left({E_{0,0}^{(k)}}+{E_{0,1}^{(k)}}+{E_{1,0}^{(k)}}+{E_{1,1}^{(k)}}+{E_{2,0}^{(k)}}\right)^{p/2}},\]

where ${E_{{p_{x}},{p_{y}}}^{(k)}}={({(nh)^{-1/2}}\cdot \mathbf{E}[K(({x_{0}}-{X_{(k)}})/h){(({x_{0}}-{X_{(k)}})/h)^{{p_{x}}}}{Y_{(k)}^{{p_{y}}}}])^{2}}$.

For instance,

\[ {E_{2,0}^{(k)}}\le \frac{h}{n}\cdot {\left({\underset{-\infty }{\overset{+\infty }{\int }}}{z^{2}}K(z){f^{(k)}}({x_{0}}-hz)dz\right)^{2}}\sim \frac{h}{n}\cdot {C_{2,0}^{(m)}},\]

as $n\to \infty $, where

\[ {C_{2,0}^{(m)}}={\left({f^{(k)}}({x_{0}}+){u_{2}^{-}}+{f^{(k)}}({x_{0}}-){u_{2}^{+}}\right)^{2}}.\]

By similar reasoning for the other terms we obtain

(16)

\[ n\cdot |\mathbf{E}[{\tilde{\eta }_{(k):n}}]{|^{p}}\le n\cdot {\left(\frac{h}{n}\right)^{p/2}}\cdot {C^{(k)}},\]

for some ${C^{(k)}}\lt \infty $.

By Assumption 4,

(17)

\[ {n^{1-p/2}}\cdot {h^{p/2}}\to 0,\hspace{1em}n\to \infty ,\]

since $p/2\ge 1$. Then (16) and (17) yield (14).

To show (13), observe that for any $\tau \gt 0$

(18)

\[ \mathbf{E}\left[\min \left(|{\tilde{\eta }_{(k):n}}{|^{2}},|{\tilde{\eta }_{(k):n}}{|^{s}}\right)\right]\le {Z_{n}}(\tau )+{\tau ^{s-2}}{Z_{n}}(0),\]

where

(19)

\[ {Z_{n}}(\tau )=\mathbf{E}\left[|{\tilde{\eta }_{(k):n}}{|^{2}}\mathbf{1}\{|{\tilde{\eta }_{(k):n}}|\ge \tau \}\right].\]

We will show that ${Z_{n}}(0)$ is bounded and ${Z_{n}}(\tau )\to 0$ as $n\to \infty $ for any $\tau \gt 0$. So, taking τ small enough we can make the right-hand side of (18) as small as desired.

Let

\[ V(z,x,u)={K^{2}}(z)(1+{({g^{(k)}}(x)+u)^{2}}+{z^{2}}+{z^{2}}{({g^{(k)}}(x)+u)^{2}}+{z^{4}}).\]

Then

\[\begin{aligned}{}& {Z_{n}}(\tau )=\frac{1}{h}\mathbf{E}\left[V\left(\frac{{x_{0}}-{X_{(k)}}}{h},{X_{(k)}},{\varepsilon _{(k)}}\right)\mathbf{1}\left\{V\left(\frac{{x_{0}}-{X_{(k)}}}{h},{X_{(k)}},{\varepsilon _{(k)}}\right)\gt {\tau ^{2}}nh\right\}\right]\\ {} & \hspace{1em}=\frac{1}{h}{\int _{-A}^{A}}\mathbf{E}\left[V\left(\frac{{x_{0}}-x}{h},x,{\varepsilon _{(k)}}\right)\mathbf{1}\left\{V\left(\frac{{x_{0}}-x}{h},x,{\varepsilon _{(k)}}\right)\gt {\tau ^{2}}nh\right\}\right]{f^{(k)}}(x)dx\\ {} & \hspace{1em}={\int _{-A}^{A}}\mathbf{E}\left[V\left(z,{x_{0}}-hz),{\varepsilon _{(k)}}\right)\mathbf{1}\left\{V\left(z,{x_{0}}-hz,{\varepsilon _{(k)}}\right)\gt {\tau ^{2}}nh\right\}\right]{f^{(k)}}({x_{0}}-hz)dz.\end{aligned}\]

By Assumption 8, ${g^{(k)}}$ and ${f^{(k)}}$ are bounded in a neighborhood B of ${x_{0}}$. For n large enough $[{x_{0}}-hA,{x_{0}}+hA]\in B$, so for $-A\le z\le A$,

(20)

\[ V\left(z,{x_{0}}-hz,u\right){f^{(k)}}({x_{0}}-hz)\le \bar{V}(z,u),\]

where

\[\begin{array}{l}\displaystyle \bar{V}(z,u)=\bar{f}{K^{2}}(z)(1+{(\bar{g}+|u|)^{2}}+{z^{2}}+{z^{2}}{(\bar{g}+|u|)^{2}}+{z^{4}}),\\ {} \displaystyle \bar{f}=\underset{x\in B}{\sup }{f^{(k)}}(x),\hspace{1em}\bar{g}=\underset{x\in B}{\sup }|{g^{(k)}}(x)|.\end{array}\]

By Assumptions 7 and 10,

(21)

\[ {\int _{-A}^{A}}\mathbf{E}\left[\bar{V}(z,{\varepsilon _{(k)}})\right]dz={V^{\ast }}\lt \infty .\]

So, by (20), ${Z_{n}}(0)\lt {V^{\ast }}$.

Observe that

\[ \mathbf{1}\left\{V\left(\frac{{x_{0}}-x}{h},x,u\right)\gt {\tau ^{2}}nh\right\}\to 0\]

as $n\to \infty $, since $nh\to \infty $ by Assumption 5. So, with (20) and (21) in mind, by the Lebesgue dominated convergence theorem we obtain ${Z_{n}}(\tau )\to 0$ as $n\to \infty $ for any $\tau \gt 0$.

Thus, for any $\delta \gt 0$ we can take $\tau \gt 0$ so small that ${\tau ^{s-2}}{Z_{n}}(0)\lt {\tau ^{s-2}}{V^{\ast }}\lt \delta /2$ and then ${n_{0}}$ so large that ${Z_{n}}(\tau )\lt \delta /2$ for $n\gt {n_{0}}$. By (18) this yields ${M_{2}}(s)\to 0$ as $n\to \infty $. So Assumption 4 of Lemma 3 holds.

Applying Lemma 3 to ${\eta _{j:n}^{(m)}}$ we obtain the statement of Lemma 1. □

Proof of Lemma 2.

Consider ${c_{n}^{(m)}}\hspace{0.1667em}=\hspace{0.1667em}{e_{2,0:n}^{(m)}}{e_{0,1:n}^{(m)}}-{e_{1,1:n}^{(m)}}{e_{1,0:n}^{(m)}}$ and ${d_{n}^{(m)}}\hspace{0.1667em}=\hspace{0.1667em}{e_{2,0:n}^{(m)}}{e_{0,0:n}^{(m)}}-{({e_{1,0:n}^{(m)}})^{2}}$, where

(22)

\[ {e_{{p_{x}},{p_{y}}:n}^{(m)}}=\mathbf{E}[{S_{{p_{x}},{p_{y}}}^{(m)}}]={\underset{-\infty }{\overset{+\infty }{\int }}}K(z){z^{{p_{x}}}}{g^{(m)}}({x_{0}}-hz){f^{(m)}}({x_{0}}-hz)dz.\]

By continuity of U and convergence ${\mathbf{e}_{n}^{(m)}}\to {\mathbf{e}^{(m)}}$, we get $\mathbf{U}({\mathbf{e}_{n}^{(m)}})\to \mathbf{U}({\mathbf{e}^{(m)}})={g^{(m)}}({x_{0}})$.

We will examine the rate of convergence to zero for the difference

(23)

\[ \mathbf{U}({\mathbf{e}_{n}^{(m)}})-{g^{(m)}}({x_{0}})=\frac{{c_{n}^{(m)}}-{d_{n}^{(m)}}{g^{(m)}}({x_{0}})}{{d_{n}^{(m)}}}.\]

From (22) one obtains

(24)

\[ {c_{n}^{(m)}}={\underset{-\infty }{\overset{+\infty }{\int }}}{g^{(m)}}({x_{0}}-hz)K(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz\]

and

(25)

\[ {d_{n}^{(m)}}={\underset{-\infty }{\overset{+\infty }{\int }}}K(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz.\]

By Taylor’s expansion for ${g^{(m)}}$ in the neighborhood of ${x_{0}}$, we obtain, as $n\to \infty $,

\[\begin{aligned}{}& {c_{n}^{(m)}}-{d_{n}^{(m)}}{g^{(m)}}({x_{0}})\\ {} & \hspace{1em}={\underset{-\infty }{\overset{+\infty }{\int }}}({g^{(m)}}({x_{0}}-hz)-{g^{(m)}}({x_{0}}))K(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz\\ {} & \hspace{1em}={\dot{g}^{(m)}}({x_{0}})(-h){\underset{-A}{\overset{A}{\int }}}zK(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz\\ {} & \hspace{2em}+\frac{{h^{2}}}{2}\cdot {\ddot{g}^{(m)}}({x_{0}}){\underset{-A}{\overset{A}{\int }}}{z^{2}}K(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz\\ {} & \hspace{2em}+{\underset{-A}{\overset{A}{\int }}}R(hz)K(z)({e_{2,0:n}^{(m)}}-z{e_{1,0:n}^{(m)}}){f^{(m)}}({x_{0}}-hz)dz=:{J_{1:n}^{(m)}}+{J_{2:n}^{(m)}}+{J_{3:n}^{(m)}},\end{aligned}\]

where $R(t)$ is some function, such that $|R(t)|/{t^{2}}\to 0$ as $t\to 0$. By (22),

(26)

\[ {J_{1:n}^{(m)}}=0\]

and

(27)

\[ {J_{2:n}^{(m)}}=\frac{{h^{2}}}{2}\cdot {\ddot{g}^{(m)}}({x_{0}})({({e_{2,0:n}^{(m)}})^{2}}-{e_{1,0:n}^{(m)}}{e_{3,0:n}^{(m)}}).\]

The asymptotics of

(28)

\[ {J_{3:n}^{(m)}}={e_{2,0:n}^{(m)}}{J_{R,0:n}^{(m)}}-{e_{2,0:n}^{(m)}}{J_{R,1:n}^{(m)}}\]

remains to be examined as $n\to \infty $, where

\[ {J_{R,p:n}^{(m)}}={\int _{-A}^{A}}R(hz){z^{p}}K(z){f^{(m)}}({x_{0}}-hz)dz.\]

where $p\in \{0,1\}$. Since ${e_{p,q:n}^{(m)}}\to {e_{p,q}^{(m)}}$, it suffices to investigate the asymptotics of ${J_{R,p:n}^{(m)}}$.

Note that for ${J_{R,p:n}^{(m),+}}={\textstyle\int _{0}^{A}}R(hz){z^{p}}K(z){f^{(m)}}({x_{0}}-hz)dz$,

\[ {J_{R,p:n}^{(m),+}}\sim {f^{(m)}}({x_{0}}-){\int _{0}^{A}}R(hz){z^{p}}K(z)dz,\hspace{1em}n\to \infty ,\]

For any $\varepsilon \in (0,1)$, there exists such $N(\varepsilon )$ that $|R(t)|\le \varepsilon {t^{2}}$, $n\ge N(\varepsilon )$. For $n\ge N(\varepsilon )$,

(29)

\[\begin{aligned}{}& \bigg|{\int _{0}^{A}}R(hz){z^{p}}K(z)dz\bigg|\le \varepsilon {h^{2}}{\int _{0}^{A}}{z^{2+p}}K(z)dz=o({h^{2}}),\hspace{1em}n\to \infty .\end{aligned}\]

Similarly

\[ {J_{R,p:n}^{(m),-}}={\int _{-A}^{0}}R(hz){z^{p}}K(z){f^{(m)}}({x_{0}}-hz)dz=o({h^{2}}),\hspace{1em}n\to \infty .\]

Thus,

(30)

\[ {J_{R,p:n}^{(m)}}={J_{R,p:n}^{(m),+}}+{J_{R,p:n}^{(m),-}}=o({h^{2}}),\hspace{1em}n\to \infty .\]

From (28) and (30), we get

(31)

\[ {J_{3:n}^{(m)}}=o({h^{2}}),\hspace{1em}n\to \infty \]

From (26), (27) and (31), we obtain the statement of Lemma 2. □

6 Simulations

6.1 Description of simulations

For simulations we considered a mixture of regressions with $M=2$ components. The concentrations $\{{p_{j:n}^{(m)}}\}$ were defined by

\[ {p_{j:n}^{(1)}}=\frac{j}{n},\hspace{1em}{p_{j:n}^{(2)}}=1-\frac{j}{n},\hspace{1em}j=\overline{1,n}.\]

The distribution of regressor ${X_{j}}$ was the same for both components. Its PDF was

\[ f(t)=\frac{3}{2}\cdot {\mathbf{1}_{(0,1/2]}}(t)+\frac{1}{2}\cdot {\mathbf{1}_{(1/2,1)}}(t),\hspace{1em}t\in \mathbb{R}.\]

The distribution of ${\varepsilon _{j}}$ was different for different experiments. Regression functions were defined as

\[ {g^{(m)}}(t)=\left\{\begin{array}{l@{\hskip10.0pt}l}\sin (2\pi t),\hspace{1em}& m=1,\\ {} \cos (2\pi t),\hspace{1em}& m=2.\end{array}\right.\]

Estimation was performed at ${x_{0}}=1/2$, which is a discontinuity point of $f(t)$. The simulation procedure was as follows:

1. For each sample size $n\in \{100,500,1000,5000,10000\}$, we generate $B=1000$ copies of ${\{({X_{j}},{Y_{j}})\}_{j=\overline{1,n}}}$ from the described model.
2. In each copy, the modified local linear regression estimator ${\hat{g}_{n}^{(m)}}({x_{0}})$ is computed at ${x_{0}}$ for each $m=\overline{1,M}$.
3. Having an array with B values of ${\hat{g}_{n}^{(m)}}({x_{0}})$, we compute sample bias and standard deviation
\[\begin{aligned}{}{\text{Bias}_{n}^{(m)}}& ={n^{2/5}}({\mathbf{E}_{\ast }}[{\hat{g}^{(m)}}({x_{0}})]-{g^{(m)}}({x_{0}})),\\ {} {\text{Std}_{n}^{(m)}}& ={n^{2/5}}\cdot \sqrt{{\mathbf{Var}_{\ast }}[{\hat{g}^{(m)}}({x_{0}})]}.\end{aligned}\]

For ${\hat{g}_{n}^{(m)}}({x_{0}})$ we select an optimal bandwidth $h={H_{\ast }^{(m)}}{n^{-1/5}}$ and the Epanechnikov kernel

\[ K(t)=\mathbf{1}\{|t|\lt 1\}\cdot \frac{3}{4}\cdot (1-{t^{2}}).\]

In this scenario, ${H_{\ast }^{(1)}}$ does not exist since ${E_{(m)}}=0$. So, in the experiments, we let ${H^{(1)}}={H_{\ast }^{(2)}}$.

6.2 Performance of mLLE

Experiment 1. In this experiment„ we consider ${\varepsilon _{j}}\sim N(0,1.25)$. The results of Experiment 1 for mLLE are presented in Table 1.

Table 1.

Computed values of sample bias and standard deviation for each n and m

n	$m=1$		$m=2$
	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$
100	0.7239	13.021	−3.5882	149.4207
500	0.2935	2.6544	1.0857	2.692
1000	0.0979	2.5753	1.3031	2.6607
5000	0.1256	2.6908	1.3599	2.6578
10000	−0.0103	2.5745	1.2657	2.5809
∞	0	2.6547	1.3274	2.6547

Here ${H_{\ast }^{(2)}}\approx 0.6261$. The simulation results show the agreement with the asymptotic considerations for large n.

Experiment 2. In this experiment, we consider ${\varepsilon _{j}}\sim {T_{5}}$, where ${T_{5}}$ is the Student T distribution with 5 degrees of freedom. The results of Experiment 2 for mLLE are presented in Table 2.

Here ${H_{\ast }^{(2)}}\approx 0.6575$. These results are also in accordance with the asymptotic calculations.

Table 2.

Computed values of sample bias and standard deviation for each n and m

n	$m=1$		$m=2$
	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$
100	0.1082	4.4768	0.6359	6.724
500	0.1659	3.059	0.9511	3.0376
1000	0.1577	3.0195	1.1993	3.0792
5000	0.1386	2.9185	1.1616	3.012
10000	0.2131	2.8123	1.2613	2.9326
∞	0	2.9282	1.4641	2.9282

Table 3.

Computed values of sample bias and standard deviation for each n and m

n	$m=1$		$m=2$
	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$	${\text{Bias}_{n}^{(m)}}$	${\text{Std}_{n}^{(m)}}$
100	0.0616	4.5498	0.9148	5.3876
500	0.2661	2.8404	1.1379	2.9482
1000	0.2239	2.7614	1.1546	2.9566
5000	−0.0964	2.6225	1.4349	2.8296
10000	0.0435	2.626	1.3573	2.9192
∞	0	2.6938	1.4317	2.8635

Experiment 3. In this experiment, we consider

\[ {\varepsilon _{j}}\mid \{{\kappa _{j}}=m\}\sim \left\{\begin{array}{l@{\hskip10.0pt}l}N(0,1.25),\hspace{1em}& m=1,\\ {} {T_{5}},\hspace{1em}& m=2.\end{array}\right.\]

The results of Experiment 3 for mLLE are presented in Table 3.

Here ${H_{\ast }^{(2)}}\approx 0.6502$. The simulations results inherit the pattern similar to the observed in the previous experiments.

7 Conclusions

We examined the asymptotic behavior of the modified local linear regression estimator for a mixture of regressions model. We proved that the modified estimator is asymptotically normal. The obtained rate of convergence of this estimator to the unknown value of the regression function at a given point is the same, regardless of whether the density function of a regressor has a jump at this point or not.

Based on the proven asymptotic theory, the optimal bandwidth parameter was derived that minimizes the asymptotic standard deviation of the estimator. The derived asymptotic theory was tested using a simulation experiment. The results obtained in this experiment are consistent with the theoretical results.

The subject of a further research is the development of the theory of optimal choice of parameters for the modified local linear regression estimator, that is, the bandwidth parameter and the kernel function.

Authors

Abstract

1 Introduction

2 Mixture of regressions and the locally linear estimator

2.1 Mixture of regressions

(1)

2.2 Minimax weights

(2)

(3)

2.3 Construction of an estimator

(4)

3 Asymptotic normality of an estimator

Theorem 1.

(5)

4 Optimal bandwidth selection

5 Proofs

(6)

Lemma 1.

(7)

(8)

Lemma 2.

Proof of Theorem 1.

(9)

(10)

(11)

Lemma 3.

Proof of Lemma 1.

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20)

(21)

Proof of Lemma 2.

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

(31)

6 Simulations

6.1 Description of simulations

6.2 Performance of mLLE

Table 1.

Table 2.

Table 3.

7 Conclusions

References

Export citation

Copy and paste formatted citation

Download citation in file

Table 1.

Table 2.

Table 3.

Theorem 1.

(5)