VMSTA Modern Stochastics: Theory and Applications 2351-6054 2351-6046 2351-6046 VTeXMokslininkų g. 2A, 08412 Vilnius, Lithuania VMSTA27 10.15559/15-VMSTA27 Research Article Identifiability of logistic regression with homoscedastic error: Berkson model ShklyarSergiyshklyar@univ.kiev.ua Taras Shevchenko National University of Kyiv, Ukraine 2015 77201522131146 2052015 1962015 2062015 © 2015 The Author(s). Published by VTeX2015 Open access article under the CC BY license.

We consider the Berkson model of logistic regression with Gaussian and homoscedastic error in regressor. The measurement error variance can be either known or unknown. We deal with both functional and structural cases. Sufficient conditions for identifiability of regression coefficients are presented.

Conditions for identifiability of the model are studied. In the case where the error variance is known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at a single point. In the case where the error variance is not known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at three (or less) points.

The key analytic tools are relations between the smoothed logistic distribution function and its derivatives.

Logistic regression binary regression errors in variables Berkson model regression calibration model 62J12
Introduction

Statistical model. Consider logistic regression with Berkson-type error in the explanatory variable. One trial is distributed as follows. Xnobs is the observed (or assigned) surrogate regressor. The true regressor is Xn=Xnobs+Un , where the error UnN(0,τ2) is independent of Xnobs . The response Yn is a binary random variable and attains either 0 or 1 with P(Yn=1|Xnobs,Xn)=exp(β0+β1Xn)1+exp(β0+β1Xn).

We consider both functional model and structural model. In the functional one, Xnobs are nonrandom variables, and in the structural one, Xnobs are i.i.d., and therefore in the latter model, (Xnobs,Xn,Yn) are i.i.d. random triples.

The couples (Xnobs,Yn) , n=1,,N , are observed. Vector β=(β0,β1) is a parameter of interest.

The error variance τ2 can be either known or unknown, and we consider both cases. The conditions for identifiability of the model (or of the parameter  β ) are presented.

Overview. Berkson models of logistic regression and probit regression were set up in Burr . For probit regression, it is shown that the introduction of Berkson-type error is equivalent to augmentation of regression parameters. As a consequence, the Berkson model of probit regression is identifiable if τ2 is known and is not identifiable if τ2 is not known.

The identifiability of the classical model was studied by Küchenhoff . He assumes that both the regressor and measurement error are normally distributed. Then univariate logistic regression is identifiable (here τ2 can be unknown), and multiple logistic regression is not identifiable. Our results can be proved similarly to  if we assume that the distribution of the surrogate regressor Xobs has an unbounded support.

For classification of errors-in-variables regression models and various estimation methods, see the monograph by Carroll et al. .

Identifiability of the statistical model can be used in the proof of consistency of the estimator. For known τ2 , the strong consistency of the maximum likelihood estimator is obtained by Shklyar . But if τ2 is not known, the maximum likelihood estimator seems to be unstable (see discussion in  or ).

Convolution of logistic function with normal density

Consider the function L0(x,σ2)=Eexp(xξ)1+exp(xξ),ξN(0,σ2),xR,σ20, that is, L0(x,0)=ex/(1+ex) and L0(x,σ2)=12πσexp(xt)1+exp(xt)et2/(2σ2)dtforσ2>0. 0.\]]]>

Denote the derivatives w.r.t. x Lk(x,σ2)=kxkL0(x,σ2).

Differentiation of Lk(x,σ2) with respect to the second argument is described in Appendix A.

The distribution of Yi given Xiobs is P[Yi=1|Xiobs]=E[P[Yi=1|Xiobs,Xi]|Xiobs]=E[exp(β0+β1Xi)1+exp(β0+β1Xi)|Xiobs]=L(β0+β1Xiobs,β12τ2) since [β0+β1XiXiobs]N(β0+β1Xiobs,β12τ2) .

Identifiability when <inline-formula id="j_vmsta27_ineq_025"><alternatives> <mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math> <tex-math><![CDATA[${\tau }^{2}$]]></tex-math></alternatives></inline-formula> is known

If in the functional model not all Xobs are equal, then the model is identifiable.

Suppose that for two values of parameters β(1)=(β0(1),β1(1)) and β(2)=(β0(2),β1(2)) , β(1)β(2) , the distributions of observations are equal. Then for all i=1,2,,N , Pβ(1)(Yi=1)=Pβ(2)(Yi=1),L0(β0(1)+β1(1)Xiobs,(β1(1))2τ2)=L0(β0(2)+β1(2)Xiobs,(β1(2))2τ2). However, by Lemma 4.1 from  the equation L0(β0(1)+β1(1)x,(β1(1))2τ2)=L0(β0(2)+β1(2)x,(β1(2))2τ2) has no more than one solution x. Hence, all Xiobs are equal.  □

By definition the degenerate distribution is the distribution concentrated at a single point. For the next theorem, see the proof of Theorem 5.1 in . ([<xref ref-type="bibr" rid="j_vmsta27_ref_004">4</xref>]).

If in the structural model the distribution of X1obs is not degenerate, then the parameter β is identifiable.

Identifiability when <inline-formula id="j_vmsta27_ineq_034"><alternatives> <mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math> <tex-math><![CDATA[${\tau }^{2}$]]></tex-math></alternatives></inline-formula> is unknown

For fixed σ2 , the function L0(x,σ2) is a bijection R(0,1) . Hence, for fixed σ12 and σ22 , the relation L0(y,σ12)=L0(x,σ22) sets the bijection RR ; see Fig. 1.

For fixed σ120 and σ220 , the sign of the second derivative of the implicit function (4) is sign(d2ydx2)=sign(σ22σ12)sign(x).

Differentiating (4), we get L1(y,σ12)dy=L1(x,σ22)dx;dydx=L1(x,σ22)L1(y,σ12). Then d2ydx2=L2(x,σ22)L1(y,σ12)L1(x,σ22)L2(y,σ12)dydxL1(y,σ12)2=L2(x,σ22)L1(y,σ12)2L1(x,σ22)2L2(y,σ12)L1(y,σ12)3=(L2(x,σ22)L1(x,σ22)2L2(y,σ12)L1(y,σ12)2)·L1(x,σ22)2L1(y,σ12). Thus, sign(d2ydx2)=sign(L2(x,σ22)L1(x,σ22)2L2(y,σ12)L1(y,σ12)2).

The plot to equation L0(y,σ12)=L0(x,σ22) for σ12<σ22

Denote by μ(z,σ2) the solution to the equation L0(μ,σ2)=z . Note that as L0(x,σ2) is the cdf of a symmetric distribution, sign(L0(x,σ2)0.5)=sign(x) . Therefore, sign(μ(z,σ2))=sign(z0.5) . Find the derivative ddv(L2(μ(z,v),v)L1(μ(z,v),v)2) for fixed z. By the implicit function theorem, dμ(z,v)dv=L2(μ(z,v),v)2L1(μ(z,v),v); also, x(L2(x,v)L1(x,v)2)=L3(x,v)L1(x,v)2L2(x,v)2L1(x,v)3,v(L2(x,v)L1(x,v)2)=L4(x,v)L1(x,v)2L2(x,v)L3(x,v)2L1(x,v)3. Then ddv(L2(μ(z,v),v)L1(μ(z,v),v)2)=L22L1·L3L12L22L13+L4L12L2L32L13=L4L123L3L2L1+2L232L14, where Lk are evaluated at the point (μ(z,v),v) . By Lemma 10, sign(ddv(L2(μ(z,v),v)L1(μ(z,v),v)2))=sign(μ(z,v))=sign(z0.5).

The function vL2(μ(z,v),v)L1(μ(z,v),v)2 is monotone (it is increasing for z>0.5 0.5$]]> and decreasing for z<0.5 ). For x and y satisfying (4), x=μ(z,σ22)andy=μ(z,σ12) with z=L0(y,σ12)=L0(x,σ22) ; note that sign(z0.5)=sign(x) . Then sign(L2(x,σ22)L1(x,σ22)2L2(y,σ12)L1(y,σ12)2)=sign(σ22σ12)sign(x), and with (5), we can obtain the desired equality sign(d2ydx2)=sign(σ22σ12)sign(x). □ The equation L0(β0(1)+β1(1)x,σ12)=L0(β0(2)+β1(2)x,σ22) has no more than three solutions, unless either β(1)=β(2)andσ12=σ22 or β1(1)=β1(2)=0andL0(β0(1),σ12)=L0(β0(2),σ22). In exceptional cases (7) and (8), equation (6) is an identity. The proof has the following idea: if a twice differentiable function y(x) satisfies (4), then the plot of the function either is a straight line (if σ12=σ22 ) or intersects any straight line at no more than three points. Consider four cases. Case 1. σ12=σ22 . Since the function L0(z,σ2) is strictly increasing in z, Eq. (6) is equivalent to β0(1)+β1(1)x=β0(2)+β1(2)x. Equation (6) has only one solution if β1(1)β1(2) ; it is an identity if β(1)=β(2) , and it has no solutions if β1(1)=β1(2) but β0(1)β0(2) . Case 2. β1(2)=0 and β1(1)0 . For any fixed σ2 , the function zL0(z,σ2) is a bijection R(0,1) . Denote the inverse function μ(Z,σ2) : L0(z,σ2)=Z if and only if z=μ(Z,σ2) . Equation (6) has a unique solution x=μ(L0(β0(2),σ22),σ12)β0(1)β1(1). Case 3. β1(2)=β1(1)=0 . Neither side of (6) depends on x. Equation (6) becomes L0(β0(1),σ12)=L0(β0(2),σ22) . Equation (6) either holds for all x or does not hold for any x. Case 4. σ12σ22 and β1(2)0 . Make a linear variable substitution: denote z2=β0(2)+β1(2)x . Then Eq. (6) becomes L0(β0(1)+β1(1)β1(2)·(z2β0(2)),σ12)=L0(z2,σ22). Define the function z1(z2) from the equation L0(z1(z2),σ12)=L0(z2,σ22). The function z1(z2):RR is implicitly defined by Eq. (4): there the equality holds if and only if y=z1(x) . Hence, the function z1(z2) satisfies Lemma 3. Equation (9) is equivalent to z1(z2)β0(1)β1(1)β1(2)·(z2β0(2))=0. By Lemma 3, sign(d2dz22(z1(z2)β0(1)β1(1)β1(2)·(z2β0(2))))=sign(d2z1(z2)dz22)=sign(σ22σ12)sign(z2). Then the derivative of the left-hand size of (10) ddz2(z1(z2)β0(1)β1(1)β1(2)·(z2β0(2))) is strictly monotone on both intervals (,0] and [0,+) , and hence (11) attains 0 no more than at two points. Then the left-hand side of (10) has no more than three intervals of monotonicity, and Eq. (10) has no more than three solutions. Equation (6) has the same number of solutions. □ If in the functional model there are four different Xobs , then the parameters β and β12τ2 are identifiable. Suppose that there are two sets of parameters (β(1),(τ(1))2) and (β(2),(τ(2))2) that for a given sample of the surrogate, the regressors {X0n,n=1,,N} provide the same distribution of Yn , n=1,,N . Then for all n=1,,N , Pβ(1),(τ(1))2(Yn=1)=Pβ(2),(τ(2))2(Yn=1);L0(β0(1)+β1(1)Xnobs,(β1(1))2(τ(1))2)=L0(β0(2)+β1(2)Xnobs,(β1(2))2(τ(2))2). The equation L0(β0(1)+β1(1)x,(β1(1))2(τ(1))2)=L0(β0(2)+β1(2)x,(β1(2))2(τ(2))2) has at least four solutions. Then by Lemma 4 either β(1)=β(2)and(β1(1))2(τ(1))2=(β1(2))2(τ(2))2, or β1(1)=β2(2)=0andL0(β0(1),(β1(1))2(τ(1))2)=L0(β0(2),(β1(2))2(τ(2))2). In the latter alternative, (β1(1))2(τ(1))2=(β1(2))2(τ(2))2=0andβ0(1)=β0(2) since L0(b0,0)=11+eb0 is a strictly increasing function in b0 . □ If in the structural model the distribution of X0 is not concentrated at three (or less) points, then the parameters β and β12τ2 are identifiable. Suppose that there are two sets of parameters (β(1),(τ(1))2) and (β(2),(τ(2))2) for which the same bivariate distribution of (X1obs,Y1) is obtained. The random variable P[Y1=1X1obs] satisfies Eq. (3) almost surely for each set of parameters. Hence, the equality L0(β0(1)+β1(1)X1obs,(β1(1))2(τ(1))2)=L0(β0(2)+β1(2)X1obs,(β1(2))2(τ(2))2) holds almost surely. The rest of the proof is the same as in Theorem 5. □ Differentiation of <inline-formula id="j_vmsta27_ineq_102"><alternatives> <mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math> <tex-math><![CDATA[$L_{k}(x,{\sigma }^{2})$]]></tex-math></alternatives></inline-formula> Consider the sum of two independent random variables ζ=λ+ξ , where λ has the logistic distribution P(λx)=exp(x)1+exp(x),xR, and ξN(0,σ2) . We allow σ2=0 , and then ξ=0 almost surely. The function L0(x,σ2) defined in (1) is the cdf of ζ, and the function L1(x,σ2) defined in (2) is the pdf of ζ. The partial derivatives of Lk(x,v) are xLk(x,v)=Lk+1(x,v),vLk(x,v)=12Lk+2(x,v); see the proof in [4, Section 2]. The functions Lk(x,v) are infinitely differentiable and bounded on R×[0,+) . Since the distribution of ζ is symmetric, Lk(x,σ2)=(1)k1Lk(x,σ2),k1, that is, L1(x,σ2) and L3(x,σ2) are even functions in x, and L2(x,σ2) and L4(x,σ2) are odd functions in x. The key inequality The next lemma is similar to Lemma 2.1 in . Hence, the proof is brief; see  for details. Let ξ and η be two independent random variables, where ξN(0,1) . Denote ζ=ξ+η and let pζ(z) be the pdf of ζ. Then d3dz3(lnpζ(z))=μ3[η|ζ=z], where μ3[ηζ=z] is the third conditional central moment, μ3[η|ζ=z]=E[(ηE[η|ζ=z])3|ζ=z]. We have pζ(z)=Epξ(zη)=12πEe12(zη)2. Then pζ(z)=12πE[(ηz)e12(zη)2],ddz(lnpζ(z))=pζ(z)pζ(z)=E[(ηz)e12(zη)2]Ee12(zη)2=Eηe12(zη)2Ee12(zη)2z,d2dz2(lnpζ(z))=Eη2e12(zη)2Ee12(zη)2(Eηe12(zη)2)2(Ee12(zη)2)21,d3dz3(lnpζ(z))=(Ee12(zη)2)3×(E[η2(ηz)e12(zη)2](Ee12(zη)2)2+Eη2e12(zη)2E[(ηz)e12(zη)2]Ee12(zη)22E[η(ηz)e12(zη)2]Eηe12(zη)2Ee12(zη)22Eη2e12(zη)2Ee12(zη)2E[(ηz)e12(zη)2]+2(Eηe12(zη)2)2E[(ηz)e12(zη)2])=(Ee12(zη)2)3×(Eη3e12(zη)2(Ee12(zη)2)23Eη2e12(zη)2Eηe12(zη)2Ee12(zη)2+2(Eηe12(zη)2)3). If η has a pdf, the conditional pdf of η given ζ=z is equal to pη|ζ=z(y)=pη(y)e12(zy)2Ee12(zη)2; otherwise, we can use the conditional density of η w.r.t. marginal density dcdfη|ζ=z(y)dcdfη(y)=e12(zy)2Ee12(zη)2. Anyway, the conditional moments of η given ζ=z are equal to E[ηk|ζ=z]=Eηke12(zη)2Ee12(zη)2. From (13) and (14) it follows that d3dz3(lnpζ(z))=E[η3|ζ=z]3E[η2|ζ=z]E[η|ζ=z]+2(E[η|ζ=z])3=μ3[η|ζ=z]. □ Let ξ and η be independent random variables such that ξN(μ,σ2) . Denote ζ=ξ+η , and denote the pdf of ζ by pζ(z) . Then d3dz3(lnpζ(z))=1σ6μ3[ηζ=z]. Assume that the distribution of a random variable X satisfies the following conditions: X has a continuously differentiable density pX(x) . X is unimodal in the following sense: there exists a mode MR such that for all xR , we have the equality sign(pX(x))=sign(Mx) . Whenever x1<M<x2 and pX(x1)=pX(x2) , then pX(x1)>pX(x2) -p_{X}(x_{2})$]]>.

E|X|3< .

Then μ3(X):=E(XEX)3>0 0$]]>. 1) EX>M M$]]>. Denote by x1(z) and x2(z) the solutions to the equation pX(x)=z (see Fig. 2): x1(z)<M<x2(z)if0<z<max(pX);x1(z)=M=x2(z)ifz=max(pX);pX(x1(z))=pX(x2(z))=zif0<zmax(pX). Represent the expectation as a double integral and change the order of integration: EX=M+(xM)pX(x)dx=M+{(x,z)|0zpX(x)}(xM)dxdz=M+0max(pX)(x1(z)x2(z)(xM)dx)dz=M+0max(pX)(x2(z)M)2(Mx1(z))22dz.

To proof of Lemma 9, part 1). Sample pX(x) and definition of x1(z) and x2(z)

For all x2>M M$]]>, by the implicit function theorem, ddx2x1(pX(x2))=pX(x2)pX(x1(pX(x2)))>1 -1\]]]> because pX(x1(pX(x2)))=pX(x2) implies pX(x1(pX(x2)))>pX(x2)>0 -{p^{\prime }_{X}}(x_{2})>0$]]>. Note that x1(pX(M))=M . By the Lagrange theorem, x1(pX(x2))=M+(x2M)·ddx3x1(pX(x3))|x3=M+(x2M)θ for some θ(0,1) ; x1(pX(x2))>M(x2M)forx2>M;x1(z)>M(x2(z)M)for0<z<max(pX);x2(z)M>Mx1(z)>0;(x2(z)M)22>(Mx1(z))22; M-(x_{2}-M)\hspace{1em}\text{for}\hspace{2.5pt}x_{2}>M;\\{} \displaystyle x_{1}(z)& \displaystyle >M-\big(x_{2}(z)-M\big)\hspace{1em}\text{for}\hspace{2.5pt}0M-x_{1}(z)>0;\\{} \displaystyle \frac{{(x_{2}(z)-M)}^{2}}{2}& \displaystyle >\frac{{(M-x_{1}(z))}^{2}}{2};\end{array}\]]]> the last integrand in (15) is positive, and then (15) implies EX>M M$]]>. 2) Consider the function f(t)=pX(EX+t)pX(EXt), which is odd and strictly decreasing on the interval [(EXM),EXM] . Therefore, f(t) attains 0 only once on this interval, that is, at the point 0 (see Fig. 3). To proof of Lemma 9, part 2) If t>EXM \operatorname{\mathsf{E}}X-M$]]> (more generally, |t|>EXM \operatorname{\mathsf{E}}X-M$]]>) and f(t)=0 , then f(t)=pX(EX+t)+pX(EXt)>0 0$]]> by condition 3) of Lemma 9. Therefore, f(t) can attain 0 only once on (EXM,+) , and if it attains 0 (say, at a point t1>EXM>0 \operatorname{\mathsf{E}}X-M>0$]]>), it is increasing in the neighborhood of t1 . Hence, there may be two cases of sign changing of f(t) (Fig. 3). Either t1>0xR:sign(f(t))=sign(t)sign(|t|t1), 0\hspace{2.5pt}\forall x\in \mathbb{R}\hspace{0.2778em}:\hspace{0.2778em}\operatorname{sign}\big(f(t)\big)=\operatorname{sign}(t)\operatorname{sign}\big(|t|-t_{1}\big),\]]]> or xR:sign(f(t))=sign(t). 3) We have 0=E[XEX]=(xEX)pX(x)dx=tpX(EX+t)dt=0tpX(EX+t)dt+0(t)pX(EXt)dt=0tf(t)dt, where f(t) is defined in the second part of the proof. Note that the case (17) is impossible because otherwise the last integrand in (18) would be negative and thus the integral could not be equal to 0. 4) Similarly to (18), E(XEX)3=0t3f(t)dt. Subtract t12 times Eq. (18), where t1 comes from (16): E(XEX)3=0t(t2t12)f(t)dt. The integrand is positive for t>0 0$]]>, tt1 , and hence μ3[X]=E(XEX)3>0 0$]]>. □ For all xR and σ20 , sign(L4(x,σ2)L1(x,σ2)23L3(x,σ2)L2(x,σ2)L1(x,σ2)+2L2(x,σ2)3)=sign(x). Lemma 11 is needed to prove Lemma 10. The notation F(y) and y0 is common for Lemmas 10 and 11. For fixed x>0 0$]]> and σ2 , consider the function F(y)=ln(ey(ey+1)2)(yx)22σ2. Its derivative F(y)=12eyey+1yxσ2 is strictly decreasing, and limyF(y)=+,limy+F(y)=. Hence, F(y) attains 0 at a unique point. Denote this point by y0 , and then sign(F(y))=sign(yy0).

For the function F(y) defined in (19), for y0 satisfying (20), and for y3 and y4 such that F(y3)+F(y4)=0 and y3<y4 , we have the following inequalities:

y3<y0<y4 and F(y3)=F(y4)>0 0$]]>. y3+y4>0 0$]]>.

F(y3)<F(y4)<0 .

F(y3)>F(y4) F(y_{4})$]]>. 1) The inequality y3<y0<y4 is a consequence of (20), and (20) implies F(y3)>0 0$]]>.

2) y3+y4>0 0$]]>. For all yR, F(y)+F(y)=2xσ2>0. 0.\]]]> Since F(y3)+F(y3)>0 0$]]> and F(y3)+F(y4)=0 , we have F(y3)>F(y4) {F^{\prime }}(y_{4})$]]>, and then y3<y4 because the derivative F(y) is decreasing. 3) F(y3)<F(y4)<0 . The second derivative F(y)=2ey(ey+1)21σ2 is an even function strictly increasing on [0,+) and attaining only negative values. The inequalities y3<y4 and y3+y4>0 0$]]> can be rewritten as |y3|<y4 , and then F(y3)=F(|y3|)<F(y4)<0.

4) F(x3)>F(x4) F(x_{4})$]]>. Consider the inverse function (F)1(t),tR. Its derivative is ddt((F)1(t))=1F((F)1(t))<0. Then ddt(F((F)1(t)))=F((F)1(t))F((F)1(t))=tF((F)1(t));ddt(F((F)1(t))F((F)1(t)))=tF((F)1(t))+tF((F)1(t)). Apply already proven part 3) of Lemma 11. If t>0 0$]]>, then (F)1(t)<(F)1(t) (because (F)1(t) is a decreasing function) and F((F)1(t))+F((F)1(t))=tt=0 . Then by part 3) F((F)1(t))<F((F)1(t))<0,t>0. 0.\]]]> Hence, ddt(F((F)1(t))F((F)1(t)))>0,t>0. 0,\hspace{1em}t>0.\]]]> Note that F((F)1(0))F((F)1(0))=0. By the Lagrange theorem, for t>0 0$]]>, F((F)1(t))F((F)1(t))=t·ddt1(F((F)1(t1))F((F)1(t1)))>0, 0,\]]]> where the derivative is taken at some point t1(0,t) . Substituting t=F(y3)>0 0$]]> (then t=F(y4) ), we obtain F(y3)F(y4)>0 0$]]>. □ Case 1. x>0 0$]]> and σ2>0 0$]]>. Recall that for fixed σ2 , L1(x,σ2) is the pdf of η+ξ , where η and ξ are independent variables, P(η<y)=eyey+1 and ξN(0,σ2) (see Appendix A). By Corollary 8, d3dx3(lnL1(x,σ2))=1σ6μ3[η|η+ξ=x], but d3dx3(lnL1(x,σ2))=L4L123L3L2L1+2L23L13, where Lk are evaluated at the point (x,σ2) . Since L1(x,σ2)>0 0$]]>, we have to prove that μ3[η|η+ξ=x]>0 0$]]>. Therefore, we apply Lemma 9. The pdf of the conditional distribution of η given η+ξ=x is equal to pη|η+ξ=x(y)=1Ee(ηx)22σ2·ey(1+ey)2e(yx)22σ2. The pdf pη|η+ξ=x(y) is continuously differentiable. The conditional distribution has a finite kth moment because yke(yx)22σ2 is bounded for any kN . Hence, conditions 1) and 4) of Lemma 9 are satisfied. Evaluate lnpη|η+ξ=x(y)=ln(ey(ey+1)2)yx2σ2ln(Ee(ηx)22σ2)=F(y)+C, where the function F(y) is defined in (19), and C=ln(Eexp((ηx)22σ2)) depends only on x and σ2 and does not depend on y. We check condition 2) of Lemma 9: pη|η+ξ=x(y)=eF(y)+C;ddypη|η+ξ=x(y)=F(y)eF(y)+C;sign(ddypη|η+ξ=x(y))=sign(F(y))=sign(yy0), and condition 2) holds with M=y0 , where y0 is defined just above (20). Now check condition of 3) of Lemma 9. The proof is illustrated by Fig. 4. Assume that pη|η+ξ=x(y1)=pη|η+ξ=x(y2) and y1<y0<y2 . Then F(y1)=F(y2) . Denote y4=(F)1(F(y1)). Then F(y1)+F(y4)=F(y1)F(y1)=0 , and by (20), as y1<y0 , we have F(y1)>0 0$]]>, F(y4)<0 , y4>y0>y1 y_{0}>y_{1}$]]>. By Lemma 11, F(y1)>F(y4) F(y_{4})$]]>.

To proof of Lemma 10. Checking condition 3) of Lemma 9

Hence, F(y2)=F(y1)>F(y4) F(y_{4})$]]>. Because the function F(y) is decreasing on (y0,+) (see (20)), we have y2<y4 . Since the function F(y) is decreasing, F(y2)>F(y4)=F(y1) {F^{\prime }}(y_{4})=-{F^{\prime }}(y_{1})$]]>, which implies F(y1)+F(y2)>0 0$]]>. By (24) we have pη|η+ξ=x(y1)+pη|η+ξ=x(y2)>0 0$]]>.

All the conditions of Lemma 9 are satisfied. By Lemma 9, μ3[η|η+ξ=x]>0 0$]]>, and by (22)–(23), L4(x,σ2)L1(x,σ2)23L3(x,σ2)L2(x,σ2)L1(x,σ2)+2L2(x,σ2)>0 0\]]]> for all x>0 0$]]> and σ2>0 0$]]>. Case 2. x0 and σ2>0 0$]]>. The distribution of η+ξ is symmetric. Hence, L1(x,σ2) and L3(x,σ2) are even functions in x, and L2(x,σ2) and L4(x,σ2) are odd functions in x. Then L4(x,σ2)L1(x,σ2)23L3(x,σ2)L2(x,σ2)L1(x,σ2)+2L2(x,σ2)3 is an odd function in x. It is equal to 0 for x=0 , and it is negative for x<0 by Case 1; see (25).

Case 3. σ2=0 . The function L1(x,0) is the pdf of the logistic distribution, and Lk+1(x,0) is its kth derivative: L1(x,0)=ex(1+ex)2;L2(x,0)=ex(1ex)(1+ex)3;L3(x,0)=ex(1+ex)4(14ex+e2x);L4(x,0)=ex(1ex)(1+ex)5(110ex+e2x). Then L4L123L3L2L1+2L23=e3x(1ex)(1+ex)9(2ex);sign(L4L123L3L2L1+2L23)=sign(x), where Lk are evaluated at the point (x,0) .

Lemma 10 is proven.  □

References Burr, D.: On errors-in-variables in binary regression – Berkson case. J. Am. Stat. Assoc. 83(403), 739743 (1988). MR0963801. doi:10.1080/01621459.1988.10478656 Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Models: A Modern Perspective. CRC Press (2006). MR2243417. doi:10.1201/9781420010138 Küchenhoff, H.: The identification of logistic regression models with errors in the variables. Stat. Pap. 36(1), 4147 (1995). MR1334083. doi:10.1007/BF02926017 Shklyar, S.V.: Logistic regression with homoscedastic errors – A Berkson model. Theory Probab. Math. Stat. 85, 169180 (2012). MR2933712. doi:10.1090/S0094-9000-2013-00883-7