On closeness of two discrete weighted sums

The effect that weighted summands have on each other in approximations of $S=w_1S_1+w_2S_2+\cdots+w_NS_N$ is investigated. Here, $S_i$'s are sums of integer-valued random variables, and $w_i$ denote weights, $i=1,\dots,N$. Two cases are considered: the general case of independent random variables when their closeness is ensured by the matching of factorial moments and the case when the $S_i$ has the Markov Binomial distribution. The Kolmogorov metric is used to estimate the accuracy of approximation.


Introduction
Let us consider a typical cluster sampling design: the entire population consists of different clusters, and the probability for each cluster to be selected into a sample is known. The sum of sample elements is then equal to S = w 1 S 1 +w 2 S 2 +· · ·+w N S N . Here, S i is the sum of independent identically distributed (iid) random variables (rvs) from the i-th cluster. A similar situation arises in actuarial mathematics when the sum S models the discounted amount of the total net loss of a company, see, for example, [24]. Note that then S i may be the sum of dependent rvs. Of course, in actuarial models, w i are also typically random, which makes our research just a first step in this direction. In many papers, the limiting behavior of weighted sums is investigated with the emphasis on weights or tails of distributions, see, for example, [6, 16-18, 23, 25-30], and references therein. We, however, concentrate on the impact of S − w i S i on w i S i . Our research is motivated by the following simple example. Let us assume that S i is in some sense close to Z i , i = 1, 2. Then a natural approximation to w 1 S 1 + w 2 S 2 is w 1 Z 1 + w 2 Z 2 . Suppose that we want to estimate the closeness of both sums in some metric d(·, ·). The standard approach which works for the majority of metrics then gives The triangle inequality (1) is not always useful. For example, let S 1 and Z 1 have the same Poisson distribution with parameter n and let S 2 and Z 2 be Bernoulli variables with probabilities 1/3 and 1/4, respectively. Then (1) ensures the trivial order of approximation O(1) only. Meanwhile, both S and Z can be treated as small (albeit different) perturbations to the same Poisson variable and, therefore, one can expect closeness of their distributions at least for large n. The 'smoothing' effect that other sums have on the approximation of w i S i is already observed in [7] (see also references therein). For some general results involving the concentration functions, see, for example, [10,20].
To make our goals more explicit, we need additional notation. Let Z denote the set of all integers. Let F (resp. F Z , resp. M) denote the set of probability distributions (resp. distributions concentrated on integers, resp. finite signed measures) on R. Let I a denote the distribution concentrated at real a and set I = I 0 . Henceforth, the products and powers of measures are understood in the convolution sense. Further, for a measure M , we set M 0 = I and exp{M } = ∞ k=0 M k /k!. We denote by M (t) the Fourier-Stieltjes transform of M . The real part of M (t) is denoted by Re M (t). Observe also that exp{M (t)} = exp{ M (t)}. We also use L(ξ) to denote the distribution of ξ.
The Kolmogorov (uniform) norm |M | K and the total variation norm M of M are defined by respectively. Here M = M + − M − is the Jordan-Hahn decomposition of M . Also, for any two measures M and V , For F ∈ F , h 0, Lévy's concentration function is defined by All absolute positive constants are denoted by the same symbol C. Sometimes to avoid possible ambiguities, the constants C are supplied with indices. Also, the constants depending on parameter N are denoted by C(N ). We also assume usual conventions b j=a = 0 and b j=a = 1, if b < a. The notation Θ is used for any signed measure satisfying Θ 1. The notation θ is used for any real or complex number satisfying |θ| 1.

Sums of independent rvs
The results of this section are partially inspired by a comprehensive analytic research of probability generating functions in [12] and the papers on mod-Poisson convergence, see [2,13,14], and references therein. Assumptions in the above-mentioned papers are made about the behavior of characteristic or probability generating functions. The inversion inequalities are then used to translate their differences to the differences of distributions. In principle, mod-Poisson convergence means that if an initial rv is a perturbation of some Poisson rv, then their distributions must be close. Formally, it is required for exp{−λ n (e it − 1)}f n (t) to have a limit for some sequence of Poisson parametersλ n , as n → ∞. Here, f n (t) is a characteristic function of an investigated rv. Division by a certain Poisson characteristic function is one of the crucial steps in the proof of Theorem 2.1 below, which makes it applicable to rvs satisfying the mod-Poisson convergence definition, provided they can be expressed as sums of independent rvs. Though we use factorial moments, similar to Section 7.1 in [2], our work is much more closer in spirit to [21], where general lemmas about the closeness of lattice measures are proved.
In this section, we consider a general case of independent non-identically distributed rvs, forming a triangular array (a scheme of series). Let S i = X i1 + X i2 + · · · + X ini , Z i = Z i1 + Z i2 + · · · + Z ini , i = 1, 2, . . . , N . We assume that all the X ij , Z ij are mutually independent and integer-valued. Observe that, in general, are not integer-valued and, therefore, the standard methods of estimation of lattice rvs do not apply. Note also that, since any infinitely divisible distribution can be expressed as a sum of rvs, Poisson, compound Poisson and negative binomial rvs can be used as Z i .
The distribution of X ij (resp. Z ij ) is denoted by F ij (resp. G ij ). The closeness of characteristic functions will be determined by the closeness of corresponding factorial moments. Though it is proposed in [2] to use standard factorial moments even for rvs taking negative values, we think that right-hand side and left-hand side factorial moments, already used in [21], are more natural characteristics. Let, for k = 1, 2, . . . , and any F ∈ F Z , For the estimation of the remainder terms we also need the following notation: , Var(Z ij )), and For the last equality, see (1.9) and (5.15) in [5]. Next we formulate our assumptions. For some fixed integer s 1, i = 1, . . . , N, j = 1, . . . , n i , Now we are in position to formulate the main result of this section.
If, in addition, s is even and The factor ( n i=1 ni j=1 u ij ) −1/2 estimates the impact of S on approximation of w i S i . The estimate (6) takes care of a possible symmetry of distributions.
If, in each sum S i and Z i , all the rvs are identically distributed, then we can get rid of the factor containing variances. We say that condition (ID) is satisfied if, for each i = 1, 2, . . . , N , all rvs X ij and Z ij (j = 1, . . . , n i ) are iid with distributions F i and G i , respectively. Observe, that if condition (ID) is satisfied, then the characteristic functions of S and Z are respectively equal to We also use notation u i instead of u ij , since now u i1 = u i2 = · · · = u ini .
Theorem 2.2. Let the assumptions (2)-(4) and the condition (ID) hold. Then How does Theorem 2.1 compare to the known results? In [4], compound Poissontype approximations to non-negative iid rvs in each sum were considered under the additional Franken-type condition: see [8]. Similar assumptions were used in [7,21]. Observe that Franken's condition requires almost all probabilistic mass to be concentrated at 0 and 1. Indeed, then Meanwhile, Theorems 2.1 and 2.2 hold under much milder assumptions and, as demonstrated in the example below, can be useful even if (8) is not satisfied. Therefore, even for the case of one sum when N = 1, our results are new. Example . We assume that n 2 = n and n 1 = ⌈ √ n ⌉ is the smallest integer greater or equal to √ n. Then ν + k (F j ) = ν + k (G j ), k = 1, 2, 3, β + 4 (F j , G j ) = 9, u j = 3/8, (j = 1, 2). Therefore, by Theorem 2.2 In this case, Franken's condition (8) is not satisfied, since ν + Next we apply Theorem 2.2 to the negative binomial distribution. For real r > 0 and 0 <p < 1, let ξ ∼ NB(r,p) denote the distribution with Corollary 2.1. Let assumptions of Theorem 2.2 hold with X 1j concentrated on nonnegative integers and let EX 3 1j < ∞, (j = 1, . . . , N ). Let G j be defined by (9). Then Hereũ Then the accuracy of approximation in (10) is of the order O((n 1 + · · · + n N ) −1/2 ).

Sums of Markov Binomial rvs
We already mentioned that it is not always natural to assume independence of rvs. In this section, we still assume that S = w 1 S 1 + w 2 S 2 + · · · + w N S N with mutually independent S i . On the other hand, we assume that each S i has a Markov Binomial (MB) distribution, that is, S i is a sum of Markov dependent Bernoulli variables. Such a sum S has a slightly more realistic interpretation in actuarial mathematics. Assume, for example, that we have N insurance policy holders, i-th of whom can get ill during an insurance period and be paid a claim w i . The health of the policy holder depends on the state of her/his health in the previous period. Therefore, we have a natural two state (healthy, ill) Markov chain. Then S i is an aggregate claim for ith insurance policy holder after n i periods, meanwhile S is an aggregate claim of all holders. Limit behavior of the MB distribution is a popular topic among mathematicians, discussed in numerous papers, see, for example, [3,9,11], and references therein.
Let 0, ξ i1 , . . . , ξ ini , . . . , (i = 1, 2, . . . , N ) be a Markov chain with the transition probabilities The distribution of S i = ξ i1 + · · · + ξ ini (n i ∈ N) is called the Markov binomial distribution with parameters p i , q i , p i , q i , n i . The definition of a MB rv slightly differs from paper to paper. We use the one from [3]. Note that the Markov chain, considered above, is not necessarily stationary. Furthermore, the distribution of w i S i is denoted by H in = L(w i S i ). For approximation of H in we use the signed compound Poisson (CP) measure with matching mean and variance. Such signed CP approximations usually outperform both the normal and CP approximations, see, for example, [1,3,20]. Let Observe that Y i (t) + 1 is the characteristic function of the geometric distribution. Let Y i be a measure corresponding to Y i (t). For approximation of H in we use the signed CP measure D in The CP limit occurs when nq i →λ, see, for example, [3]. Therefore, we assume q i to be small, though not necessarily vanishing. Let, for some fixed integer k 0 2, In principle, the first assumption in (12) can be dropped, but then exponentially vanishing remainder terms appear in all results, making them very complicated.
Remark 3.1. Let all q i C, i = 1, . . . , N . Then, obviously, the right-hand side of (13) is majorized by Therefore, even in this case, the result is comparable with the Berry-Esseen theorem.

Auxiliary results
Lemma 4.1. Let h > 0, W ∈ M, W {R} = 0, U ∈ F and | U (t)| C V (t), for |t| 1/h and some symmetric distribution V having non-negative characteristic function. Then

Lemma 4.1 is a version of Le Cam's smoothing inequality, see Lemma 9.3 in [5]
and Lemma 3 on p. 402 in [15].
If, in addition, F (t) 0, then Lemma 4.2 contains well-known properties of Levy's concentration function, see, for example, Chapter 1 in [19] or Section 1.5 in [5].
Expansion in left-hand and right-hand factorial moments for Fourier-Stieltjes transforms is given in [21]. Here we need its analogue for distributions.
Proof. For measures, concentrated on non-negative integers, (18) is given in [5], Lemma 2.1. Observe that distribution F can be expressed as a mixture F = p + F + + p − F − of distributions F + , F − concentrated on non-negative and negative integers, respectively. Then Lemma 2.1 from [5] can be applied in turn to F + and to F − (with I −1 ). The remainder terms can be combined, since (I −1 − I) = I −1 (I − I 1 ) = (I 1 − I)Θ.
Lemma 4.6. Let M ∈ M be concentrated on Z, k∈Z |k||M {k}| < ∞. Then, for any a ∈ R, b > 0 the following inequality holds . Lemma 4.6 is a well-known inversion inequality for lattice distributions. Its proof can be found, for example, in [5], Lemma 5.1.
Proof. The statements follow from Lemma 5.4, Lemma 5.1 and the relations given on pp. 1131-1132 in [3]. The estimate for e −Cini follows from the first assumption in (12) and the following simple estimate e −Cini e −Cini/2 e −Ciniq i /2 C(k 0 ) n k0 .
Proof of Theorem 2.1. Let F ij,w (resp. G ij,w ) denote the distribution of w i X ij (resp. w i Z ij ). Note that F ij,w (t) = F ij (w i t). By the triangle inequality Similarly, For the sake of brevity, let Then, combining both equations given above with Lemma 4.4 , we get Let |t| π/ max i w i . Then it follows from (19) that Observe that e uij sin 2 (twi/2)/π e 1/π = C. Next, let It is not difficult to check, that exp{L} is a CP distribution with non-negative characteristic function. Also, by the definition of exponential measure, exp{−L}, which can be called the inverse to exp{L}, is a signed measure with finite variation. We have Next step is similar to the definition of mod-Poisson convergence. We apply Lemma 4.1 with h = max w i /π and U 1 = exp{L} and W 1 = (I wi − I) s+1 E ij T i exp{−L}. By Lemma 4.2, From (22) and (23), it follows that It remains to estimate W 1 . Let Then by the properties of the total variation norm, The first norm in (27) is bounded by exp{ 1 8 u ij [ I wi − I + I −wi − I ]} exp{1/2}. The total variation norm is invariant with respect to scale. Therefore, without loss of generality, we can switch to w l = 1. In this case, we use the notations Φ ik , Ψ ik . Then, again employing the inverse CP measures, we get We apply Lemma 4.6 with a = u ij + is the mean of F ik and, due to assumption (3), of G ik . Let It follows from (19) that For the estimation of |∆ ′ (t)|, observe that by (19) and (20) Φ ik (t)e −itµ ik ′ F ik (t)e −itµ ik u ik π sin(t/2)e (u ik /2π) sin 2 (t/2) The same bound holds for |( Ψ ik (t) exp{−itµ ik }) ′ |. The direct calculation shows that Taking into account of the previous two estimates, it is not difficult to prove that u ik sin 2 (t/2) From Lemma 4.6, it follows that The remaining two norms in (27) can be estimated similarly: Substituting (28), (29) into (27), we obtain Combining (30) with (25), (26) and (24), we get Substituting the last estimate into (21) we complete the proof of (5). The proof of (6) is very similar and, therefore, omitted.
Proof of Theorem 2.2. We outline only the differences from the proof of Theorem 2.1. No use of convolution with the inverse Poisson measure is required, since we have powers of F ni i , which can be used for Levy's concentration function. Let ⌊a⌋ denote an integer part of a and let a(k) := ⌊(k − 1)/2⌋, b(k) := ⌊(n i − k)/2⌋. Then, as in the proof of Theorem 2.1, we obtain Here F iw and G iw denote the distributions of w i X ij and w i Z ij , respectively. We can apply Lemma 4.1 to the Kolmogorov norm given above, taking W = (I wi − iw . The remaining distribution is used in Levy's concentration function. The Fourier-Stieltjes transform W (t)/t is estimated exactly as in the proof of Theorem 2.1. The total variation of any distribution is equal to 1, therefore W I wi − I 2 and we can avoid application of Lemma 4.6.

Proof of Theorem 3.1
The proof is similar to the one given in [22]. Let A i = exp{n i γ i Y i /30}. From Lemma 4.7, it follows that Here we have added index to Θ i emphasizing that they might be different for different i. As usual, we assume that the convolution Both summands on the right-hand side of (31) are estimated similarly. Observe that Next we apply Lemma 4.1 with W = Y i and h = max w i /π and V with V (t) = exp − 1 90 i−1 k=1 j k max(n k q k , 1) sin 2 (tw k /2) + N k=i max(n k q k , 1) sin 2 (tw k /2) .
Therefore, using Lemma 4.2, we prove j k max(n k q k , 1) + N k=i+1 max(n k q k , 1) Next observe that by Lemma 4.7, .
The estimation of the second sum in (31) is almost identical and, therefore, omitted.