On a bound of the absolute constant in the Berry--Esseen inequality for i.i.d. Bernoulli random variables

It is shown that the absolute constant in the Berry--Esseen inequality for i.i.d. Bernoulli random variables is strictly less than the Esseen constant, if $1\le n\le 500000$, where $n$ is a number of summands. This result is got both with the help of a supercomputer and an interpolation theorem, which is proved in the paper as well. In addition, applying the method developed by S. Nagaev and V. Chebotarev in 2009--2011, an upper bound is obtained for the absolute constant in the Berry--Esseen inequality in the case under consideration, which differs from the Esseen constant by no more than 0.06%. As an auxiliary result, we prove a bound in the local Moivre--Laplace theorem which has a simple and explicit form. Despite the best possible result, obtained by J. Schulz in 2016, we propose our approach to the problem of finding the absolute constant in the Berry--Esseen inequality for two-point distributions since this approach, combining analytical methods and the use of computers, could be useful in solving other mathematical problems.


Introduction
Let us consider the class V of all probability distributions on the real line R, which have zero mean, unit variance and finite third absolute moment. Let X, X 1 , X 2 , . . . , X n be i.i.d. random variables, where the distribution of X belongs to V . Denote According to the Berry-Esseen inequality [2,5], there exists such an absolute constant C 0 that for all n = 1, 2, . . . , The first upper bounds for the constant C 0 were obtained by C.-G. Esseen [5] (1942), H. Bergström [1] (1949) and K. Takano [30] (1951).
In 1956 C.-G. Esseen [6] showed that where C E = 3+ √ 10 6 √ 2π = 0.409732 . . . . He has also found a two-point distribution, for which the equality holds in (2). He has proved the uniqueness of such a distribution (up to a reflection).
Consequently, C 0 ≥ C E . The result of Esseen served as an argument for the conjecture that V.M. Zolotarev advanced in 1966 [38]. The question whether the conjecture is correct remains open up to now. Since then, a number of upper bounds for C 0 have been obtained. A historical review can be found, for example, in [11,17,28]. We only note that recent results in this field were obtained by I.S. Tyurin (see, for example, [31][32][33][34][35]), V.Yu. Korolev and I.G. Shevtsova (see, for example, [11,13]), and I.G. Shevtsova (see, for example, [25][26][27][28][29]). The best upper estimate, known to date, belongs to Shevtsova: C 0 ≤ 0.469 [28]. Note that in obtaining upper bounds, beginning from the estimates in [38,39], calculations play an essential role. In addition, because of the large amount of computations, it was necessary to use computers.
The present paper is devoted to estimation of C 0 in the particular case of i.i.d. Bernoulli random variables. In this case we will use the notation C 02 instead of C 0 . Let us recall the chronology of the results along these lines.
In 2007 C. Hipp and L. Mattner published an analytical proof of the inequality C 02 ≤ 1 √ 2π in the symmetric case [8].
In 2009 the second and third authors of the present paper have suggested the compound method in which a refinement of C.L.T. for i.i.d. Bernoulli random variables was used along with direct calculations [17]. In unsymmetric case this method allows to obtain majorants for C 02 , arbitrarily close to C E , provided that the computer used is of sufficient power. The main content of the preprint [17] was published in 2011, 2012 in the form of the papers [18,19]. In these papers, the following bound was proved, C 02 < 0.4215.
In 2015 we obtained the bound by applying the same approach as in [17][18][19], with the only difference that this time a supercomputer was used instead of an ordinary PC. We announced bound (4) in [20], but for a number of reasons, delayed publishing the proof, and do it just now. While the present work being in preparation, we have detected a small inaccuracy in the calculations, namely, bound (4) must be increased by 10 −7 . Thus the following statement is true. holds.
Meanwhile, in 2016 J. Schulz [23] obtained the unimprovable result: if the symmetry condition is violated, C 02 = C E . As it should be expected, J. Schulz's proof turned out to be very long and complicated. It should be said that methods based on the use of computers, and analytical methods complement each other. The former ones cannot lead to a final result, but they do not require so much effort. On the other hand, they allow us to predict the exact result, and thus facilitate theoretical research.
2 Shortly about the proof of Theorem 1 2.1 Some notations. On the choice of the left boundary of the interval for p Let X, X 1 , X 2 , . . . , X n be a sequence of independent random variables with the same distribution: In what follows we use the following notations, Obviously, In this paper we solve, in particular, the problem of computing the sequence T (n) = sup p∈(0,0.5) T n (p) for all n such that 1 ≤ n ≤ N 0 . Here and in what follows, Note that for fixed n and p, the quantity sup x∈R |F n,p (x) − G n,p (x)| is achieved at some discontinuity point of the function F n,p (x) (see Lemma 2). We consider distribution functions that are continuous from the left. Consequently, where i are integers, Note also that we can vary the parameter p in a narrower interval than [0, 0.5], namely, in This conclusion follows from the next statement.
Lemma 1 is proved in Section 4 with the help of some modification of the Berry -Esseen inequality (with numerical constants) obtained in [10,12]. Remark 1. By the same method that is used to prove inequality (10), the estimate T n (p) ≤ 0.369 is found in [19] in the case 0 < p < 0.02 (n ≥ 1) (see the proof of (1.37) in [19]), where an earlier estimate of V. Korolev and I. Shevtsova [11] is used, instead of [10,12]. Note that the use of modified inequalities of the Berry -Esseen type, obtained in [10,12,11], is not necessary for obtaining estimates of T n (p) in the case when p are close to 0.
An alternative approach, using Poisson approximation, is proposed in the preprint [17]. Let us explain the essence of this method.
An alternative bound is found in the domain {(p, n) : 0.258 ≤ λ ≤ 6, n ≥ 200}, where λ = np. Under these conditions, we have p ≤ 0.03, i.e. p are small enough. Consequently, the error arising under replacement of the binomial distribution by Poisson distribution Π λ with the parameter λ is small.
Next, the distance d(Π λ , G λ ) between Π λ and normal distribution G λ with the mean λ and the variance λ is estimated, where d(U, V ) = sup x∈R |U (x) − V (x)| for any distribution functions U (x) and V (x). Then the estimate of the distance between G λ and the normal distribution G n,p with the mean λ and variance npq is deduced. Summing the obtained estimates, we arrive at an estimate for the distance between the original binomial distribution and G n,p . As a result, in [17, Lemma 7.8, Theorem 7.2] we derive the estimate T n (p) < 0.3607, which is valid for all points (p, n) in the indicated domain. T n (p).
It was proved in [19] that C 02 (200) < 0.4215. By that time it was shown with the help of a computer (see the preprint [9]) that C 02 (200) < 0.4096, i.e.
and thus, C 02 < 0.4215 for all n ≥ 1. Some words about bound (11). By (8), to get C 02 (N ) it is enough to calculate T (n) = sup p∈(0,0.5] T n (p) for every 1 ≤ n ≤ N , and then find max 1≤n≤N T (n). The calculation of T (n) is reduced to two problems. The first problem is to calculate max pj ∈S T n (p j ), where S is a grid on (0, 0.5], and the second one is to estimate T n (p) in intermediate points p. Both problems were solved in [9] for 1 ≤ n ≤ 200.
It should be noted here that, according to the method, the quantity C 02 (N ) is calculated (with some accuracy), and C 02 (N ) is estimated from above. In both cases, a computer is required. The power of an ordinary PC is sufficient for calculating majorants for C 02 (N ) whereas to calculate C 02 (N ) a supercomputer is needed if N is sufficiently large. Moreover, an additional investigation of the interpolation type is required for the convincing conclusion from computer calculations of C 02 (N ). In our paper, Theorem 2 plays this role.
Denote by symbol S the uniform grid on I with the step h = 10 −12 . The values of T n (p j ) for all p j ∈ S and 1 ≤ n ≤ N 0 were calculated on a supercomputer.
The counting algorithm is a triple loop: a loop with respect to the parameter i (see (9)) is nested in a loop with respect to the parameter p, which in turn is nested in the loop with respect to the parameter n.
With the growth of n, the computation time increased rapidly. For example, for 2000 ≤ n ≤ 2100 calculations took more than 3 hours on a computer with processor Core2Due E6400. For 2101 ≤ n ≤ N 0 calculations were carried out on the supercomputer Blue Gene/P.
It follows from [20, Corollary 7] that for n > 200 in the loop with respect to i, one can take not all values of i from 0 to n, but only those, which satisfy the inequality where ν = 3 + √ 6. This led to a significant reduction of computation time. We give information about the computer time (without waiting for the queue) in Table 1. Calculations were carried out on the supercomputer Blue Gene/P of the Computational Mathematics and Cybernetics Faculty of Lomonosov Moscow State University. After some changes in the algorithm, the calculations for n such that 490000 ≤ n ≤ N 0 , were also performed on the CC FEB RAS Computing Cluster [41]. The corresponding computer time was 6 hours and 40 minutes.
The program is written in C+MPI and registered [40].

Interpolation type results
Let p * ∈ (0, 0.5). Consider a uniform grid on [p * , 0.5] with a step h. The following statement allows to estimate the value of the function 1 ̺(p) ∆ n,k (p) at an arbitrary point from the interval [p * , 0.5] via the value of this function at the nearest grid node and h. Denote Theorem 2. Let 0 < p * < p ≤ 0.5, p ′ be a node of a grid with a step h on the interval [p * , 0.5], closest to p. Then for all n ≥ 1 and 0 ≤ k ≤ n, The next statement follows from Theorem 2. Note that without it the proof of Theorem 1 would be incomplete. Corollary 1. If p ∈ I, and p ′ is a node of the grid S, closest to p, then for all Proof. It follows from Theorem 2 that for Since L(0.1689) < 12.98, the right-hand side of inequality (15) is majorized by the number 4.6 · 10 −9 . This implies the statement of Corollary 1.

On the proof of Theorem 1
It follows from (12), Corollary 1 and Lemma 1 that for all 1 ≤ n ≤ N 0 and p ∈ (0, 0.5], the following inequality holds, T n (p) < 0.4097321346 < C E (for details, see (64)). It is easy to verify that this inequality is true for p ∈ (0.5, 1) as well. Hence, inequality (5) implies Theorem 1.

About structure of the paper
The structure of the paper is as follows. The proof of Theorem 2, the main analytical result of the paper, is given in Section 3. The proof consists of 12 lemmas. In Section 4, Theorem 1 is proved. The section consists of three subsections. In the first one, the formulation of Theorem 1.1 [19] is given. Several corollaries from the latter are also deduced here. The second subsection discusses the connection between the result of K. Neammanee [21], who refined and generalized Uspensky's estimate [36], and the problem of estimating C 02 . It is shown that one can obtain from the result of K. Neammanee the same estimate for C 02 as ours, but for a much larger N . This means that calculating C 02 (N ) requires much more computing time if to use Neammanee's estimate.
In the third subsection, we give, in particular, the proof of Lemma 1.

Proof of Theorem 2
We need the following statement, which we give without proof.

Lemma 2. Let G(x) be a distribution function with a finite number of discontinuity points, and
There exists a discontinuity point Proof. Taking into account the difference in the notations, we obtain the statement of Lemma 3 from [19, Lemma 8].
Further, we will use the following notations: pq ) j , j = 0, 1, 2, and the Taylor formula, we get is fulfilled, then with the help of Lemma 3 we arrive at the following bound for |t| ≤ π/2, Then, taking into account the elementary equality a n − b n = (a − b) (16), we obtain for |t| ≤ π/2 that Using the well-known formulas E|Y | 3 = 4 √ 2π and EY 4 = 3, we deduce from the previous inequality that for n ≥ 2, Applying Lemma 3 again, we get Moreover, by virtue of the known inequality which holds for every c > 0, we have Collecting the estimates (17)- (20), we obtain the statement of Lemma 4. Denote Lemma 5. For every n ≥ 1 and 0 ≤ k ≤ n the following bound holds, where c 1 is defined in (13).
Lemma 7. For all n ≥ 1 and 0 ≤ k ≤ n the following bound holds, Proof. It is shown in [22] that By Lemma 5, In turn, it follows from Lemma 6 that and max x |x|ϕ(x) = 1 √ 2πe < 0.242, the statement of the lemma follows from (26) and (27).
Lemma 8. For all n ≥ 1 and 0 ≤ k ≤ n the following bound holds, where c 1 , c 2 are from (13).
Proof. Similarly to the proof of Lemma 7 we obtain Hence, Since the last summand on the right-hand side of the equality is less than 0.121 pq , then by using (28) we get the statement of the lemma.
Lemma 9. For every 0 < p < 0.5, d dp Proof. The lemma follows from the equalities: d dp Lemma 10. The function A(p) decreases on the interval (0, 0.5).
We have As a result of calculations, we find that the equation A ′ 3 (p) = 0 has the single root Let f (x) be an arbitrary function. Denote by D + f (x) and D − f (x) its right-side and left-side derivatives respectively (if they exist).
Lemma 12. Let g(x) = max{f 1 (x), f 2 (x)}, where f 1 (x) and f 2 (x) are functions, differentiable on a finite interval (a, b). Then at every point x ∈ (a, b) there exist both one-side derivatives D + g(x) and D − g(x), each of which coincides with either f ′ 1 (x) or f ′ 2 (x). Proof. Let x be a point such that f 1 (x) = f 2 (x). Then the function g is differentiable at x, and in this case the statement of the lemma is trivial. Now let for a point x ∈ (a, b), First, consider the case f ′ . Then there exists h 0 > 0 such that From differentiability of the functions f 1 and f 2 it follows that for h → 0, Then using (33) we obtain the equality and using (34), Thus, existence of D + g(x) and D − g(x) follows.
Proof of Theorem 2. It follows from the definition of p ′ that either 0 < p−p ′ < h/2 or 0 < p ′ − p < h/2. In the first case the statement of the theorem follows from (37) and Lemma 11, and in the second one from (38) and Lemma 11 again.
and the sequence R 0 (p, n) := √ n ̺(p) R(p, n) tends to zero for every 0 < p ≤ 0.5, decreasing in n.
Denote E(p, n) = E(p) + R 0 (p, n). Figure 1 shows the mutual location of the following functions: E(p, n) for n = 200 and 800, E(p) and T n (p)| n=50 . Note that, as a consequence of the definition of the binomial distribution, the behavior of these functions is symmetric with respect to p = 0.5.
Proof. Since E(p, n) decreases in n, we obtain the statement of Corollary A by finding the maximal value of E(p, N 0 ) directly using a computer.
In order to verify the plausibility of the previous numerical result, we estimate the function E(p, N 0 ), making preliminary estimates of some of the terms that enter into it. This leads to the following somewhat more coarse inequality.
Proof. Separate the proof of (49) into four steps. First we rewrite R 0 (p, n) in the following form, In each function Ki(p,n)σ ω(p) , i = 1, 2, 3, we will select the principal term, and estimate the remaining ones.
Thus, for p ∈ [0.1689, 0.5], n ≥ N 0 , we have Step 4. Now consider the function Let us introduce the following notations: is the coefficient at 1 σ 2 in the expansion of R(p, n) in powers of 1 σ , D 2 (p, n) = σ 2 R(p, n), where the remainder R(p, n) is defined by equality (46). One can rewrite bound (48) in the following form, It follows from the definitions of K 1 (p, n), K 2 (p, n), and K 3 (p, n) that the coefficient at 1 or, in more detail, First we consider G 2 (p) := lim n→∞ G 2 (p, n). We have Taking into account that we obtain Since G 2 (p) decreases for p < p 1 , and increases for p > p 1 , then the maximum value of this function is achieved either at the left bound or at the right bound of the interval. We have G 2 (0.02) = 9.3336, G 2 (0.1689) = 5.2273251 . . . , G 2 (0.5) = 4.5.
Thus, since the main term of the difference D 2 (p, n) − G2(p) 36π has the order 1 √ n . The following bound for ∆ n (p), simpler than Theorem A, follows from (50) and Table 2.
Corollary C. For all p ∈ I = [0.1689, 0.5] and n ≥ N 0 , Remark 3. Corollary C allows to obtain the same estimate for C 02 as (4) [36]. To this end we introduce the following notations: S n is the number of occurrences of an event in a series of n Bernoulli trials with a probability of success p, µ = np, For every x ∈ R, define where σ = √ npq, as before. Uspensky's result can be formulated in the following form.
In 2005 K. Neammanee [21] refined and generalized (55) to the case of nonidentically distributed Bernoulli random variables. Let us formulate his result as applied to the case of Bernoulli trials: if σ 2 ≥ 100, then where a + n , b − n are defined by the formula (54). It follows from (56) that under condition σ 2 ≥ 100, We may consider p ∈ (0, 0.5]. Denote for brevity, d = 0.1618. It follows from (57) and the definition of G(·) that Taking into account that max t |t 2 − 1|e −t 2 /2 = 1, we get Denote x n = x−µ σ . It is easily seen that It follows from (58), (59) that provided that 0 < p ≤ 0.5. Thus, Note that our bound (51) is more accurate than (60). To get the bound 0.409954 for C 02 from (60), we should take n almost five times larger than in (52). Really, with the help of a computer we have Remark 4. In 2014 V. Senatov obtained non-uniform estimates of the approximation accuracy in the central limit theorem, and, in particular, generalized Uspensky's result (55) to lattice distributions [24].

Proof of Theorem 1
Before proving Theorem 1, we first prove Lemma 1. Remark 5. If instead of [10, Theorem 1] we will use other modifications of the Berry-Esseen inequality by I. Shevtsova [25], the interval (0,0.1689] for which Lemma 1 is true can be extended, i.e. one can find b > 0.1689 such that the inequality max p∈(0,b] T n (p) < C E will be fulfilled. This will narrow the interval I (see (12)), which in turn will reduce the computation time on the supercomputer.
Let us indicate such b. The estimates found in [25] as applied to the particular case of Bernoulli trials can be written in the following form, ∆ n (p) ≤ 0.3328 √ n ̺(p) + 0.429 .
It is easy to verify that inequality (62) implies b = 0.174, and (63) implies that b = 0.177.
Then by Lemma 1, this inequality is fulfilled for all p ∈ (0, 0.5] as well. It is not hard to see that the bound (64) is also true for all p ∈ (0.5, 1). Hence, bound (5) implies Theorem 1.