Asymptotic genealogies for a class of generalized Wright-Fisher models

We study a class of Cannings models with population size $N$ having a mixed multinomial offspring distribution with random success probabilities $W_1,\ldots,W_N$ induced by independent and identically distributed positive random variables $X_1,X_2,\ldots$ via $W_i:=X_i/S_N$, $i\in\{1,\ldots,N\}$, where $S_N:=X_1+\cdots+X_N$. The ancestral lineages are hence based on a sampling with replacement strategy from a random partition of the unit interval into $N$ subintervals of lengths $W_1,\ldots,W_N$. Convergence results for the genealogy of these Cannings models are provided under regularly varying assumptions on the tail distribution of $X_1$. In the limit several coalescent processes with multiple and simultaneous multiple collisions occur. The results extend those obtained by Huillet (2014) for the case when $X_1$ is Pareto distributed and complement those obtained by Schweinsberg (2003) for models where one samples without replacement from a supercritical branching process.


Introduction
Let X 1 , X 2 , . . . be independent copies of a random variable X taking values in (0, ∞). For N ∈ N := {1, 2, . . .} define S N := X 1 + · · · + X N and W i := X i /S N , i ∈ {1, . . . , N }. The weights W 1 , . . . , W N are exchangeable random variables with W 1 + · · · + W N = 1. In particular, E(W i ) = 1/N , i ∈ {1, . . . , N }. Consider the Cannings model [6,7] with population size N and non-overlapping generations such that, conditional on W 1 , . . . , W N , the offspring sizes ν 1 , . . . , ν N have a multinomial distribution with parameters N and W 1 , . . . , W N . Thus, the offspring distribution is where (x) 0 := 1 and (x) k := x(x − 1) · · · (x − k + 1) for x ∈ R and k ∈ N. In [15] this model is studied for the case when X is Pareto distributed. If X is gamma distributed with density x → x r−1 e −x /Γ(r), x > 0, for some r > 0, then (W 1 , . . . , W N ) is symmetric Dirichlet distributed with parameter r, leading to the Cannings model with offspring distribution iN [rN ] N , i 1 , . . . , i N ∈ N 0 with i 1 + · · · + i N = N , where [x] 0 := 1 and [x] i := x(x + 1) · · · (x + i − 1) for x ∈ R and i ∈ N. This Dirichlet multinomial model has been studied extensively in the literature (see, for example, Griffiths and Spanò [13]). In a series of papers [16,17,19] a subclass of Cannings models, called conditional branching process models in the spirit of Karlin and McGregor [23,24], has been investigated, whose offspring distributions are (by definition) obtained by assuming that P(X 1 + · · · + X N = N ) > 0 and conditioning on the event that X 1 + · · · + X N = N . This construction based on conditioning is rather different from the construction based on sampling from a random partition of the unit interval we are dealing with in this article. Note however that several concrete examples (such as the classical Wright-Fisher model and the above mentioned Dirichlet multinomial model) can be constructed in both ways, either by sampling or by conditioning. For example, the Dirichlet multinomial model is obtained by taking N independent and identically distributed negative binomial random variables X 1 , . . . , X N with parameter r > 0 and p ∈ (0, 1), so with distribution P(X 1 = k) = r+k−1 k p r (1 − p) k , k ∈ N 0 , and conditioning on the event that X 1 + · · · + X N = N . The closely related model studied by Schweinsberg [37] differs from ours, since sampling is performed without replacement from a discrete super-critical Galton-Watson branching process, as explained in [37,Section 1.3]. In that model, X is integer valued and satisfies E(X) > 1. In our model, X does not need to be integer valued and its mean is allowed to be less than 1. Moreover, the sampling in our multinomial model is with replacement, whereas in Schweinsberg's model it is without replacement. The same multinomial scheme with an additional dormancy mechanism is considered in the recent work of Cordero et al. [8]. A class of Dirichlet models in the domain of attraction of the Kingman coalescent is also studied in two recent works of Boenkost et al. [4,5] with an emphasis on Haldane's formula [14]. We refer the reader to Athreya [1] for some more information on Haldane's formula. Fix n ∈ {1, . . . , N } and sample n individuals from the current generation. For r ∈ N 0 define a random partition Π ) r∈N0 , called the discrete-time n-coalescent, takes values in the space P n of partitions of {1, . . . , n}. As in [15] we are interested in the limiting behaviour of the discrete-time n-coalescent as the total population size N tends to infinity. It is easily seen (and well-known) that the discrete-time n-coalescent is a time-homogeneous Markovian process. The transition probabilities p ππ ′ := P(Π if each block of π ′ is a union of some blocks of π, where j := |π ′ | denotes the number of blocks of π ′ and k 1 , . . . , k j are the group sizes of merging blocks of π. Note that Φ (N ) j (k 1 , . . . , k j ) is defined for all N, j, k 1 , . . . , k j ∈ N. Since the random variables W 1 , . . . , W N are exchangeable and satisfy W 1 + · · · + W N = 1, it follows for all N, j, k 1 , . . . , k j ∈ N with j ≤ N that Multiplication by (N ) j (= 0 for j > N ) shows that the consistency relation holds for all N, j, k 1 , . . . , k j ∈ N. Moreover, for all j, l ∈ N with j ≥ l and all k 1 , . . . , k j , m 1 , . . . , m l ∈ N with k 1 ≥ m 1 , . . . , k l ≥ m l , the monotonicity relation holds. Note that (5) follows from (4) by induction on the difference d := j − l ∈ N 0 . We refer the reader to [30, Definition 2.2] and the remark thereafter for similar statements and proofs for the full class of Cannings models. Choosing j = 1 and k 1 = 2 in (3) shows that two individuals share a common ancestor one generation backward in time with probability c N := Φ (N ) 1 (2) = N E(W 2 1 ), the so-called coalescence probability. We also introduce the effective population size N e := 1/c N . Note that c N = N E(W 2 1 ) ≥ N (E(W 1 )) 2 = 1/N or, equivalently, N e ≤ N . All Cannings models having an effective population size strictly larger than N (such as the Moran model having effective population size N e = N (N − 1)/2 > N for N ≥ 4 and most of the extended Moran models studied by Eldon and Wakeley [11] and Huillet and Möhle [18]) therefore do not belong to the class of models we are dealing with in this article. General results for Cannings models concerning the convergence of their genealogical tree to an exchangeable coalescent process as the total population size tends to infinity are provided in [32]. For information on the theory of exchangeable coalescent processes we refer the reader to Pitman [33], Sagitov [34] and Schweinsberg [35,36]. Coalescents with multiple collisions (Λ-coalescents) are Markovian stochastic processes taking values in the set of partitions of N. They are characterized by a finite measure Λ on the unit interval. Important examples are Dirac-coalescents, where Λ = δ a is the Dirac measure at a given point a ∈ [0, 1], including the prominent Kingman coalescent (Kingman [25,26,27]), where Λ = δ 0 is the Dirac measure at 0, and the star-shaped coalescent, where Λ = δ 1 . Other important examples are beta coalescents, where Λ = β(a, b) is the beta distribution with parameters a, b > 0, including the Bolthausen-Sznitman coalescent, where Λ is the uniform distribution on the unit interval (a = b = 1). The full class of exchangeable coalescent processes (Ξ-coalescents) allowing for simultaneous multiple collisions of ancestral lineages is characterized by a finite measure Ξ on the infinite simplex ∆ : An example is the two-parameter Poisson-Dirichlet coalescent with parameters α > 0 and θ > −α, where the characterizing measure ν(dx) : i on ∆ is (by definition) the Poisson-Dirichlet distribution ν = PD(α, θ) with parameters α > 0 and θ > −α. For more information on the Poisson-Dirichlet coalescent we refer the reader to Section 6 of [31]. In most studies, continuous-time coalescent processes (Π t ) t∈T with index set T = [0, ∞) are considered. Note however that all Ξ-coalescents can as well be introduced with discrete time T = N 0 . In this case one speaks about a discrete-time Ξ-coalescent (Π r ) r∈N0 . The following terminology is taken from [16, Definition 2.1].
Definition 1 (i) A Cannings model is said to be in the domain of attraction of a continuoustime coalescent Π = (Π t ) t≥0 if for each sample size n ∈ N the time-scaled ancestral process (Π denotes the restriction of Π to a sample of size n. (ii) Analogously, a Cannings model is said to be in the domain of attraction of a discretetime coalescent Π = (Π r ) r∈N0 if for each sample size n ∈ N the ancestral process (Π r ) r∈N0 denotes the restriction of Π to a sample of size n.
Conditions on the tails of the distribution of X are provided which ensure that the population model with offspring distribution (1) is in the domain of attraction of some exchangeable coalescent process. The tail condition is of the standard form P(X > x) ∼ x −α ℓ(x) as x → ∞, where α ≥ 0 and ℓ is a function slowly varying at ∞. The results are collected in Theorem 1 in Section 2. It turns out that the three parameter values α ∈ {0, 1, 2} are boundary cases. Consequently, six different regimes (α > 2, α = 2, α ∈ (1, 2), α = 1, α ∈ (0, 1) and α = 0) are considered leading to different limiting behaviours of the ancestral process. Theorem 1 also provides the asymptotics of the coalescence probability c N as N → ∞ for all six cases. In Section 3 some illustrating examples are provided including the case studied in [15] when X is Pareto distributed. The proofs are provided in the main Section 4. They are based on general convergence-to-the-coalescent theorems for Cannings models provided in [32] and combine (Abelian and Tauberian) arguments from the theory of regularly varying functions in the spirit of Karamata [20,21,22] with techniques used by Huillet [15] for the Pareto case and by Schweinberg [37] for the related model where the sampling is performed without replacement.

Results
For most of the results it is assumed that there exists a constant α ≥ 0 and a function ℓ : (0, ∞) → (0, ∞) slowly varying at ∞ such that Our main result (Theorem 1) clarifies the limiting behaviour of the ancestral structure of the Cannings model with offspring distribution (1) as the total population size N tends to infinity under the assumption (6). It turns out that the parameter values α ∈ {0, 1, 2} are boundary cases. It is hence natural to distinguish six regimes corresponding to the parameter ranges α > 2, α = 2, α ∈ (1, 2), α = 1, α ∈ (0, 1) and α = 0. In order to state the result it is convenient to introduce the function ℓ * : (1, ∞) → (0, ∞) via Note that ℓ * is non-decreasing, slowly varying at ∞ and satisfies ℓ(x)/ℓ * (x) → 0 as x → ∞, see for example Bingham [3] and the remarks thereafter. More precisely, for every λ > 0, as where the convergence holds by the uniform convergence theorem for slowly varying functions. Thus, ℓ * is a de Haan function (with ℓ-index 1) and hence slowly varying. For general information on de Haan theory we refer the reader to Chapter 3 of [3].
The main (and only) result of this article is the following.   (iv) If (6) holds with α = 1, then the model is in the domain of attraction of the continuoustime Bolthausen-Sznitman coalescent. If (a N ) N ∈N is a sequence of positive real numbers satisfying ℓ * (a N ) ∼ a N /N as N → ∞, where ℓ * is defined via (7), then the coalescence (v) If (6) holds with α ∈ (0, 1), then the model is in the domain of attraction of the discretetime Ξ-coalescent, where the characterising measure ν(dx) : (vi) If (6) holds with α = 0, then the model is in the domain of attraction of the discrete-time star-shaped coalescent and the coalescence probability satisfies c N → 1 as N → ∞.
In particular, for the first four cases (i) -(iv), c N → 0 as N → ∞.

Condition
Limiting coalescent Coalescence probability Assume now in addition that α = 1. In this case, in part (iv) of Theorem 1 one can choose a 1 := 1 and a N := CN log N , N ∈ N \ {1}. The coalescence probability thus satisfies c N ∼ CN/a N ∼ 1/ log N , in agreement with Proposition 6 of Huillet [15] for the Pareto example P(X > x) = 1/x, x > 1. The same asymptotics for the coalescence probability holds for the related model considered by Schweinsberg (see [37,Lemma 16]) and, for example, when X is discrete taking the value k ∈ N with probability P(X = k) = 1/(k(k + 1)).
Remark. One may doubt that Theorem 1 is valid when X takes values close to 0 with high probability such that E(1/S N ) = ∞ for all N ∈ N. Typical examples of this form arise when the Laplace transform ψ of X satisfies ψ(u) ∼ L(u) as u → ∞ for some function L slowly varying at ∞, or, equivalently (see Feller [12], p. 445, Theorem 2 and p. 446, Theorem 3), if By Theorem 1 this model is in the domain of attraction of the Kingman coalescent, since E(X 2 ) < ∞. The finiteness or infiniteness of E(1/S N ) turns out to be irrelevant for the statements in Theorem 1, since the convergence results of Theorem 1 solely depend on the limiting behaviour of the joint moments of the weights W 1 , . . . , W j as N → ∞. Conjectures and open problems. Theorem 1 should also hold for Schweinberg's model [37], since sampling without replacement (instead of sampling with replacement) should neither influence the asymptotics of the coalescence probability nor the limiting processes arising in Theorem 1. Note that in [37] the subclass of models without replacement is studied where the function ℓ in (6) is constant.
We leave the analysis of Schweinsberg's model under the more general assumption (6) for the interested reader. In contrast, conditional branching process models [16,17,19] seem to be harder to analyse and behave quite differently in general. Even for the subclass of so-called compound Poisson models, only partial results are available. Theorems 2.2 and 2.3 of [19] clarify that many unbiased compound Poisson models are in the domain of attraction of the Kingman coalescent, and [19, Theorem 2.5] (subcritical case) demonstrates that the limiting behaviour of compound Poisson models can differ substantially from all scenarios arising in Theorem 1.
To the best of the authors knowledge, the limiting behaviour of the ancestral structure of unbiased conditional branching process models as N → ∞ under assumptions of the form (6) has not been fully addressed in the literature. We leave this analysis for future research.

Examples
By Theorem 1, for α ≥ 2 the model is in the domain of attraction of the Kingman coalescent, for α ∈ [1, 2) in the domain of attraction of the β(2 − α, α)-coalescent, and for α ∈ (0, 1) in the domain of attraction of the discrete-time Poisson-Dirichlet coalescent with parameter α.
The Pareto example is easily generalized in various ways by replacing ℓ ≡ 1 by some other slowly varying function. For example, choosing for ℓ (a power of) the logarithm leads to the following example.
Example 2 Fix α ≥ 0 and assume that X has tail behaviour P( for some constants c > 0 and β > 0. This example includes the Pareto model (c = β = 1). Clearly, (6) holds, since ℓ slowly varies at ∞. By Theorem 1, for α ≥ 2 the model is in the domain of attraction of the Kingman coalescent, for α ∈ [1,2) in the domain of attraction of the β(2 − α, α)-coalescent, for α ∈ (0, 1) in the domain of attraction of the discrete-time Poisson-Dirichlet coalescent with parameter α, and for α = 0 in the domain of attraction of the discrete-time star-shaped coalescent. Note that The asymptotics of the coalescence probability c N as N → ∞ can hence be obtained from the formulas provided in Theorem 1. In particular, for α > 1 the asymptotics of c N depends on the concrete value of µ := E(X). For α = 1 the asymptotics of c N is obtained as follows. The sequence (a N ) N ∈N , defined via a 1 := 1 and a N : For illustration three examples with discrete X are provided.
) and Γ(.) denote the beta and the gamma function respectively. It is easily checked that P( in the domain of attraction of the β(2 − α, α)-coalescent, and for α ∈ (0, 1) in the domain of attraction of the discrete-time Poisson-Dirichlet coalescent with parameter α. Note that ℓ * (x) = Γ(α + 1) x 1 1/t dt = Γ(α + 1) log x, x > 1. In part (iv) of Theorem 1 we can thus choose a N := Γ(α + 1)N log N and obtain c N ∼ ℓ(a N )/ℓ * (a N ) = 1/ log a N ∼ 1/ log N as N → ∞. Thus, by Theorem 1, the coalescence probability c N satisfies The Yule-Simon model is a discrete analog of the Pareto model discussed in Example 1. We refer the reader exemplary to Kozubowski and Podgórski [28] for some further information on Sibuya and Yule-Simon distributions.
Part (v) of Theorem 1 ensures that the model is in the domain of attraction of the Poisson-Dirichlet coalescent with parameter α and the coalescence probability c N satisfies c N → 1 − α as N → ∞. The same results are valid when X is α-stable, α ∈ (0, 1), with Laplace transform ψ(u) := e −u α , u ≥ 0, since in this case the same asymptotics 1 − ψ(u) ∼ u α as u → 0 holds. In this sense the Sibuya example is a discrete version of the α-stable case with α ∈ (0, 1).

Proofs
The following auxiliary result (Lemma 1) is a modified version of Lemma 5 of Schweinsberg [37], adapted to our model. The result may be also viewed as a weak version of Cramér's large deviation theorem (see, for example, [10, Theorem 2.2.3]). Recall that µ := E(X) ∈ (0, ∞]. Lemma 1 For every a ∈ (0, µ) there exists q ∈ (0, 1) such that P(S N ≤ aN ) ≤ q N for all N ∈ N.
Proof. Let f denote the moment generating function of Y := X/a, i.e. f (x) : We now prove part (i) of Theorem 1.
Proof. (of Theorem 1 (i)) We first verify that N c N → ρ/µ 2 as N → ∞. We have . By the law of large numbers, (x/(x/N + S N −1 /N )) 2 → (x/µ) 2 almost surely and, hence, also in distribution as N → ∞. For any r > 0 the map x → x ∧ r is bounded and continuous on [0, ∞). Thus, In order to see that lim sup N →∞ N c N ≤ ρ/µ 2 fix a ∈ (0, µ). By Lemma 1 there exist q ∈ (0, 1) such that P(S N ≤ aN ) ≤ q N for all N ∈ N. Therefore, Thus, lim sup N →∞ N c N ≤ ρ/a 2 . Letting a ↑ µ shows that lim sup N →∞ N c N ≤ ρ/µ 2 and N c N → ρ/µ 2 is established. It is well-known (see, [29,Section 4]) that any sequence of Cannings models with population sizes N is in the domain of attraction of the Kingman coalescent if and only if Φ Fix again a ∈ (0, µ) and choose q ∈ (0, 1) as above. We have Clearly, N 2 P(X > aN ) ≤ a −2 E(X 2 1 {X>aN } ) → 0 as N → ∞, since ρ := E(X 2 ) < ∞. It hence remains to verify that N −1 E(X 3 1 {X≤aN } ) → 0 as N → ∞. Let ε > 0. Choose L sufficiently large such that E(X 2 1 {X>L} ) ≤ ε/(2a). Then, for all N ∈ N with N ≥ 2ρL/ε, We now prepare the proofs of the parts (ii) and (iii) of Theorem 1. We need the following two auxiliary results.
Lemma 2 If (6) holds for some α ≥ 0 then for all p > α, Proof. Let T be a nonnegative random variable and f : [0, ∞) → R a continuous and piecewise continuously differentiable function such that f (T ) is integrable. Then, Let x > 0. Applying (8) to T := X/x and f (t) := (t/(t + 1)) p shows that By Theorem 3 of Karamata [22], applied to the function ϕ(x) := P(X > x), which is regularly varying at ∞ with index γ := −α, it follows that, as x → ∞, The same steps, but applied to f (t) :
Proof. (of Theorem 1 (iii)) The idea of the proof is to apply the general convergence result [32, Theorem 2.1]. Having (3) in mind the main task is to derive the asymptotics of the moments of W 1 or, more generally, the asymptotics of the joint moments of the random variables W 1 , . . . , W j as N → ∞. The following proof is based on Schweinsberg's [37] method. We first verify that For all λ > µ := E(X), by the law of large numbers, P(S N −1 ≤ λN ) → 1 as N → ∞. Thus, where the last asymptotics holds by Lemma 2 and since ℓ is slowly varying at ∞. Multiplication with N α /ℓ(N ) and taking lim inf shows that lim To handle the lim sup fix a ∈ (0, µ) and decompose From Lemma 1 it follows that there exists N 0 ∈ N and q ∈ (0, 1) such that P( In order to see this let λ ∈ (a, µ) and decompose The two expectations on the right hand side are both O(ℓ(N )/N α ) by Lemma 2. Moreover, P(S N −1 ≤ λN ) → 0 and P(S N −1 > λN ) → 1 as N → ∞. Therefore, only the last term contributes to the lim sup and we obtain lim sup Letting λ ↑ µ shows that (13) holds. Thus, (12) is established.
Moreover, We now turn to the boundary case α = 2, so we prove part (ii) of Theorem 1.
Proof. (of Theorem 1 (iv)) The proof has much in common with that of part (v). The details are however slightly different. For all x > 0, .
We finally turn to the case α = 0 corresponding to the last part (vi) of Theorem 1.
Proof. (of Theorem 1 (vi)) Let Q N denote the distribution of X 2 + · · · + X N d = S N −1 . For all p > 0, From Lemma 1 it follows that there exists q ∈ (0, 1) such that E(W p Note that the integral on the right hand side of (24) does not depend on the parameter p.

Appendix
For convenience we record the following version of the monotone density theorem.
The following proof of Lemma 4 almost exactly coincides with the proofs known for standard versions of the monotone density theorem (see, for example, Bingham, Goldie and Teugels [3, Theorem 1.7.2] or Feller [12, p. 446]. The proof is provided, since the monotone density theorem in the form of Lemma 4 is heavily used throughout the proofs in Section 4. Proof. (of Lemma 4) Suppose first that g is non-increasing in some right neighborhood of 0. If 0 < a < b < ∞, then, for all x ∈ (0, x 0 /b), G(ax) − G(bx) = (ax,bx] g(y) λ(dy) so, for x small enough, The middle fraction is so the first inequality above yields Taking b := 1 and letting a ↑ 1 gives By a similar treatment of the right inequality with a := 1 and b ↓ 1 we find that the lim inf is at least ρ, and the conclusion follows. The argument when g is non-decreasing in some right neighborhood of 0 is similar. ✷ The following two results are extended versions of Theorem 2 and Theorem 3 of Karamata [22] adapted to our purposes. Lemma 5 provides conditions under which a slowly varying part below an integral can be moved in front of the integral without changing the asymptotics of the integral. Corollary 1 is a similar results for the regularly varying case. The results are slightly more general than those provided in [22], since the functions g N and f N arising in the statements are allowed to depend on N , which is not the case in the formulation of [22].