Identifiability of logistic regression with homoscedastic error: Berkson model

We consider the Berkson model of logistic regression with Gaussian and homoscedastic error in regressor. The measurement error variance can be either known or unknown. We deal with both functional and structural cases. Sufficient conditions for identifiability of regression coefficients are presented. Conditions for identifiability of the model are studied. In the case where the error variance is known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at a single point. In the case where the error variance is not known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at three (or less) points. The key analytic tools are relations between the smoothed logistic distribution function and its derivatives.


Introduction
Statistical model. Consider logistic regression with Berkson-type error in the explanatory variable. One trial is distributed as follows. X obs n is the observed (or assigned) surrogate regressor. The true regressor is X n = X obs n + U n , where the error U n ∼ N (0, τ 2 ) is independent of X obs n . The response Y n is a binary random variable and attains either 0 or 1 with P Y n =1 X obs n , X n = exp(β 0 + β 1 X n ) 1 + exp(β 0 + β 1 X n ) .
We consider both functional model and structural model. In the functional one, X obs n are nonrandom variables, and in the structural one, X obs n are i.i.d., and therefore in the latter model, (X obs n , X n , Y n ) are i.i.d. random triples. The couples (X obs n , Y n ), n = 1, . . . , N , are observed. Vector β = (β 0 , β 1 ) ⊤ is a parameter of interest.
The error variance τ 2 can be either known or unknown, and we consider both cases. The conditions for identifiability of the model (or of the parameter β) are presented.
Overview. Berkson models of logistic regression and probit regression were set up in Burr [1]. For probit regression, it is shown that the introduction of Berkson-type error is equivalent to augmentation of regression parameters. As a consequence, the Berkson model of probit regression is identifiable if τ 2 is known and is not identifiable if τ 2 is not known.
The identifiability of the classical model was studied by Küchenhoff [3]. He assumes that both the regressor and measurement error are normally distributed. Then univariate logistic regression is identifiable (here τ 2 can be unknown), and multiple logistic regression is not identifiable. Our results can be proved similarly to [3] if we assume that the distribution of the surrogate regressor X obs has an unbounded support.
For classification of errors-in-variables regression models and various estimation methods, see the monograph by Carroll et al. [2].
Identifiability of the statistical model can be used in the proof of consistency of the estimator. For known τ 2 , the strong consistency of the maximum likelihood estimator is obtained by Shklyar [4]. But if τ 2 is not known, the maximum likelihood estimator seems to be unstable (see discussion in [2] or [3]).

Convolution of logistic function with normal density
Consider the function that is, L 0 (x, 0) = e x /(1 + e x ) and Differentiation of L k (x, σ 2 ) with respect to the second argument is described in Appendix A.
The distribution of Y i given X obs 3 Identifiability when τ 2 is known Theorem 1. If in the functional model not all X obs are equal, then the model is identifiable.
Proof. Suppose that for two values of parameters β (1) = (β (2) , the distributions of observations are equal. Then for all i = 1, 2, . . . , N , However, by Lemma 4.1 from [4] the equation has no more than one solution x. Hence, all X obs i are equal.
By definition the degenerate distribution is the distribution concentrated at a single point. For the next theorem, see the proof of Theorem 5.1 in [4].

Theorem 2 ([4]).
If in the structural model the distribution of X obs 1 is not degenerate, then the parameter β is identifiable.

Lemma 4. The equation
has no more than three solutions, unless either In exceptional cases (7) and (8), equation (6) is an identity.
Proof. The proof has the following idea: if a twice differentiable function y(x) satisfies (4), then the plot of the function either is a straight line (if σ 2 1 = σ 2 2 ) or intersects any straight line at no more than three points.
Consider four cases. (2) , and it has no solutions if β . Equation (6) either holds for all x or does not hold for any x.
Case 4. σ 2 1 = σ 2 2 and β Define the function z 1 (z 2 ) from the equation The function z 1 (z 2 ) : R → R is implicitly defined by Eq. (4): there the equality holds if and only if y = z 1 (x). Hence, the function z 1 (z 2 ) satisfies Lemma 3. Equation (9) is equivalent to By Lemma 3, Then the derivative of the left-hand size of (10) is strictly monotone on both intervals (−∞, 0] and [0, +∞), and hence (11) attains 0 no more than at two points. Then the left-hand side of (10) has no more than three intervals of monotonicity, and Eq. (10) has no more than three solutions. Equation (6) has the same number of solutions.
Theorem 5. If in the functional model there are four different X obs , then the parameters β and β 2 1 τ 2 are identifiable.
The partial derivatives of L k (x, v) are Since the distribution of ζ is symmetric, that is, L 1 (x, σ 2 ) and L 3 (x, σ 2 ) are even functions in x, and L 2 (x, σ 2 ) and L 4 (x, σ 2 ) are odd functions in x.

B The key inequality
The next lemma is similar to Lemma 2.1 in [4]. Hence, the proof is brief; see [4] for details.
Lemma 7. Let ξ and η be two independent random variables, where ξ ∼ N (0, 1). Denote ζ = ξ + η and let p ζ (z) be the pdf of ζ. Then where µ 3 [η | ζ=z] is the third conditional central moment, Proof. We have If η has a pdf, the conditional pdf of η given ζ=z is equal to otherwise, we can use the conditional density of η w.r.t. marginal density Anyway, the conditional moments of η given ζ=z are equal to From (13) and (14) it follows that Corollary 8. Let ξ and η be independent random variables such that ξ ∼ N (µ, σ 2 ). Denote ζ = ξ + η, and denote the pdf of ζ by p ζ (z). Then Lemma 9. Assume that the distribution of a random variable X satisfies the following conditions: 1) X has a continuously differentiable density p X (x).
2) X is unimodal in the following sense: there exists a mode M ∈ R such that for all x ∈ R, we have the equality sign(p ′ X (x)) = sign(M − x).

4) E |X| 3 < ∞.
Then µ 3 (X) := E(X − E X) 3 > 0. Proof. 1) E X > M . Denote by x 1 (z) and x 2 (z) the solutions to the equation p X (x) = z (see Fig. 2): Represent the expectation as a double integral and change the order of integration: For all x 2 > M , by the implicit function theorem, Note that x 1 (p X (M )) = M . By the Lagrange theorem, for some θ ∈ (0, 1); the last integrand in (15) is positive, and then (15) implies E X > M .
2) Consider the function which is odd and strictly decreasing on the interval Therefore, f (t) attains 0 only once on this interval, that is, at the point 0 (see Fig. 3). If t > E X − M (more generally, |t| > E X − M ) and f (t) = 0, then f ′ (t) = p ′ X (E X + t) + p ′ X (E X − t) > 0 by condition 3) of Lemma 9. Therefore, f (t) can attain 0 only once on (E X − M, +∞), and if it attains 0 (say, at a point t 1 > E X − M > 0), it is increasing in the neighborhood of t 1 .
Hence, there may be two cases of sign changing of f (t) (Fig. 3). Either or 3) We have where f (t) is defined in the second part of the proof. Note that the case (17) is impossible because otherwise the last integrand in (18) would be negative and thus the integral could not be equal to 0. 4) Similarly to (18), Subtract t 2 1 times Eq. (18), where t 1 comes from (16): The integrand is positive for t > 0, t = t 1 , and hence µ 3 [X] = E(X−E X) 3 > 0.

Lemma 10.
For all x ∈ R and σ 2 ≥ 0, Lemma 11 is needed to prove Lemma 10. The notation F (y) and y 0 is common for Lemmas 10 and 11.
is an even function strictly increasing on [0, +∞) and attaining only negative values.
Apply already proven part 3) of Lemma 11. If t > 0, then By the Lagrange theorem, for t > 0, where the derivative is taken at some point t 1 ∈ (0, t).
Case 3. σ 2 = 0. The function L 1 (x, 0) is the pdf of the logistic distribution, and L k+1 (x, 0) is its kth derivative: where L k are evaluated at the point (x, 0). Lemma 10 is proven.