Optimal transport between determinantal point processes and application to fast simulation

We analyze several optimal transportation problems between de-terminantal point processes. We show how to estimate some of the distances between distributions of DPP they induce. We then apply these results to evaluate the accuracy of a new and fast DPP simulation algorithm. We can now simulate in a reasonable amount of time more than ten thousands points.


Introduction
Determinantal point processes (DPP) have been introduced in the seventies [21] to model fermionic particles with repulsion like electrons.They recently regained interest since they represent the locations of the eigenvalues of some random matrices.A determinantal point process is characterized by an integral operator of kernel K and a reference measure m.The integral operator is compact and symmetric and is thus characterized by its eigenfunctions and its eigenvalues.Following [18], the eigenvalues are not measurable functions of the realizations of the point process so it is difficult to devise how a modification of the eigenfunctions, respectively of the eigenvalues or of the reference measure, modifies the random configurations of a DPP.Conversely, it is also puzzling to know how the usual transformations on point processes like thinning, dilations, displacements, translate onto K and m.
A careful analysis of the simulation algorithm given in [18] yields several answers to these questions.For instance, it is clear that the eigenvalues control the distribution of the number of points and the eigenfunctions determine the positions of the atoms once their number is known.The above mentioned algorithm is a beautiful piece of work but requires to draw points according to distributions whose densities are not expressed as combinations of classical functions, hence the necessity to use rejection sampling method.Unfortunately, as the number of drawn points increases, the densities quickly present high peaks and deep valleys inducing a high number of rejections, see Figure 1.
As a consequence, it is hardly feasible to simulate a DPP with more than one thousand points in a reasonable amount of time.As a DPP appears as the locations of the eigenvalues of some matrix ensembles, it may seem faster and simpler to draw random matrices and compute the eigenvalues with the optimized libraries to do so.There are several drawbacks to this approach: 1) we cannot control the domain into which the points fall, for some applications it may be important to simulate DPP restricted to some compact sets, 2) as eigenvalues belong to R or C, we cannot imagine DPP in higher dimensions with this approach, 3) for stationary DPP, it is often useful to simulate under the Palm measure (see below) which is known to correspond to the distribution of the initial DPP with the first eigenvalue removed so no longer corresponds to a random matrix ensemble.Several refinements of the algorithm 1 have been proposed along the years but the most advanced contributions have been made for DPP on lattices, which are of a totally different nature than continuous DPPs.We here propose to fasten the simulation of a DPP by reducing the number of eigenvalues considered and approximating the eigenfunctions by functions whose quadrature can be easily inversed to get rid of the rejection part.
We evaluate the impact of these approximations by bounding the distances between the original distribution of the DPP to be simulated and the real distribution according to which the points are drawn.
Actually, there are several notions of distances between the distributions of point processes (see [10] and references therein).We focus here on the total variation distance and on the quadratic Wasserstein distance.The former counts the difference of the number of points in an optimal coupling between two distributions.The latter evaluates the matching distance between two realizations of an optimal coupling provided that it exists.
The paper is organized as follows.We first recall the definition and salient properties of DPP.In Section 3, we briefly introduce the optimal transportation problem in its full generality and give some elements dedicated to point processes.In Section 4, we show how the eigenvalues and eigenfunctions do appear in the evaluation of the distances under scrutiny.In Section 5, we apply these results to the simulation of DPPs.

Determinantal point processes
Let E be a Polish space, O(E) the family of all non-empty open subsets of E and B denotes the corresponding Borel σ-algebra.In the sequel, m is a Radon measure on (E, B).Let N be the space of locally finite subsets in E, also called the configuration space: equipped with the topology of the vague convergence.We call elements of N configurations and identify a locally finite configuration ξ with the atomic Radon measure y∈ξ ε y , where we have written ε y for the Dirac measure at y ∈ E. Next, let N f = {ξ ∈ N : |ξ| < ∞} the space of all finite configurations on E. N f is naturally equipped with the trace σ-algebra A random point process is defined as a probability measure on (N, F).A random point process µ is characterized by its Laplace transform, which is defined for any measurable non-negative function f on E as Our notations are inspired by those of [14], where the reader can also find a brief summary of many properties of Papangelou intensities.Definition 1.We define the m-sample measure L on (N f , F f ) by the identity for any measurable nonnegative function f on N f .
Point processes are often characterized via their correlation function defined as: for all measurable nonnegative functions f on N f .For ξ = {x 1 , . . ., x n }, we will write ρ(ξ) = ρ n (x 1 , . . ., x n ) and call ρ n the n-th correlation function, where ρ n is a symmetrical function on E n .
It can be noted that correlation functions can also be defined by the following property, both characterizations being equivalent in the case of simple point processes.
Recall that ρ 1 is the mean density of particles with respect to m, and is the probability of finding a particle in the vicinity of each x i , i = 1, . . ., n.
Note that where N E can be identified with E n /S n where S n is the group of permutations over n elements, every function f : N f E → R is in fact equivalent to a family of symmetric functions (f n , n ≥ 1) where f n goes from E n to R. For the sake of notations, we omit the index n of f n .Definition 4. A measure µ on N f E is regular with respect to the reference measure m when there exists j : where j n is symmetric on E n such that for any measurable bounded f : The function j n is called the n-th Janossy density.Intuitively, it can be viewed as the probability to have exactly n points in the vicinity ( For details about the relationships between correlation functions and Janossy densities, see [7]. 2.1.Determinantal point processes.For details, we mainly refer to [24].A determinantal point on X is characterized by a kernel K and a reference measure m.The map K is supposed to be an Hilbert-Schmidt operator from L 2 (E, m) into L 2 (E, m) which satisfies the following conditions: (1) K is a bounded symmetric integral operator on L 2 (E, m), with kernel K(., .),i.e., for any x ∈ E, (2) The spectrum of K is included in [0, 1].
(3) The map K is locally of trace class, i.e., for all compact Λ ⊂ E, the restriction Definition 5.The determinantal measure on N with characteristics K and m can be defined through its correlation functions: and for n = 0, ρ 0, K (∅) = 1.
There is a particular class of DPP which is the basic blocks on which general DDP are built upon.Definition 6.A DPP whose spectrum is reduced to the singleton {1} is called a projection DPP.Actually, its kernel is of the form where M ∈ N ∪ {∞} and (φ j , j = 0, • • • , M ) is a family of orthonormal functions of L 2 (E, m).
If M is finite then almost-all configurations of such a point process have M atoms.
Alternatively, when the spectrum of K does not contain 1, we can define a DPP through its Janossy densities.In this situation, the properties of K ensure that there exists a sequence (λ i , i ≥ 1) of elements of [0, 1) with no accumulation point but 0 and a complete orthonormal basis (φ i , i ≥ 1) of L 2 (m) such that Note that if L 2 (E, m) is a C-vector space, we must modify this definition accordingly: For a compact subset Λ ⊂ X, the map J Λ is defined by: so that we have: For any compact Λ ⊂ E, the operator J Λ is an Hilbert-Schmidt, trace class operator, whose spectrum is included in [0, +∞).We denote by J Λ its kernel.For any n ∈ N, any compact Λ ⊂ E, and any (x 1 , • • • , x n ) ∈ Λ n , the n-th Janossy density is given by: (2) We can now state how the characteristics of a DPP are modified by some usual transformations on the configurations.Theorem 1.Let µ a DPP on R k with kernel K and reference measure m = hdx.Let (λ n , n ≥ 0) be its eigenvalues counted with multiplicity and (φ n , n ≥ 0) the corresponds eigenfunctions.
(1) A random thinning of probability p transforms µ into a DPP of kernel pK.
(2) A dilation of ratio ρ transforms µ into a DPP of kernel ( transforms µ into a DPP of kernel and reference measure m • H −1 , the image measure of m by H, see [6].[20] and thus admits a stationary Palm measure µ 0 .From [24], we know that µ 0 is distributed as the DPP of kernel is the same as the spectrum of K in L 2 (E, m).Actually, this transformation will be a particular case of the optimal maps obtained in solving the MKP for the Wassertein-2 distance (see Theorem 12).
Remark 2. Recall that for a Poisson process, we can obtain a realization of its Palm measure by just adding an atom at 0 to any of its realization.Unfortunately, we know from [18] that the eigenvalues of a DPP cannot be obtained as measurable functions of the configurations.Hence it is hopeless to construct a realization of µ 0 from a realization of µ.
2.1.1.Simulation of DPP.The simulation algorithm introduced [18] is up to now the most efficient to produce random configurations distributed according to a determinantal point process.It is based on the following lemma.
Lemma 2. Let µ K,m a determinantal point process of a trace-class kernel K and reference measure m.Let sp(K; and Construct a random configuration ξ as follows: Given I, draw points with joint density p I .Then ξ is distributed according to µ K,m . In the following, let φ I (x) = (φ n (x), n ∈ I).Data: R, I Result: Draw W i from the distribution with density Sampling of the locations of the points given the set I of active Bernoulli random variables.
We have two kind of difficulties here: the drawing of W i according to a density function with no particular feature so we usually have to resort to rejection sampling; when |I| is large the computation of the density may be costly as it contains a sum of |I| terms.Figure 1 also suggests that when the number of points becomes high, the profile of the conditional density might be very chaotic with high peaks and deep valleys, involving a large number of rejections in the sampling of this density.These are the problems we intend to address in the following.
Remark 3. Note that this algorithm is fully applicable even if E is a discrete finite space.It has been improved in several ways [19,25] but when it comes to simulate a DPP with a large number of points as it is necessary in some applications [5], the best way remains to use MCMC methods [1].Unfortunately, by its very construction, this last approach is not feasible when the underlying space E is continuous.

Distances derived from optimal transport
For details on optimal transport in R d and in general Polish spaces, we refer to [27,26].For X and Y two Polish spaces, for µ (respectively ν) a probability measure on X (respectively Y ), Σ(µ, ν) is the set of probability measures on X × Y whose first marginal is µ and second marginal is ν.We also need to consider a lower semi continuous function c from X × Y to R + .The Monge-Kantorovitch problem associated to µ, ν and c, denoted by MKP(µ, ν, c) for short, consists in finding (3) inf γ∈Σ(µ, ν) X×Y c(x, y)dγ(x, y).
More precisely, since X and Y are Polish and c is l.s.c., it is known from the general theory of optimal transportation, that there exists an optimal measure γ ∈ Σ(µ, ν) and that the minimum coincides with sup where (F, G) are such that F ∈ L 1 (dµ), G ∈ L 1 (dν) and F (x) + G(y) ≤ c(x, y).
We will denote by W c (µ, ν) the value of the infimum in (3).In the sequel, we need the following theorem of Brenier: x − y 2 be the Euclidean distance on R k and µ, ν two probability measures with finite second moment.If the measure µ is absolutely continuous with respect to the Lebesgue measure, there exists a unique optimal measure γ opt which realizes the minimum in (3).Moreover, there exists a unique function ψ : R k → R such that Then, we have The square root of W e (µ, ν) defines a distance on M 1 (R k ), the set of probability measures on R k , called the Wasserstein-2 distance.
For c a distance on X = Y , W c also defines a distance on M 1 (R k ), often called Kantorovitch-Rubinstein or Wasserstein-1 distance.It admits the alternative characterization.
Theorem 4 (See [11]).Let c be a distance on the Polish space (X, d X ).For µ and µ two probability measures on X, Note that d X is the distance which defines the topology of the Polish space X, it may not be identical to c.
The next result is found in [26,Chapter 7].
Theorem 5.The topologies induced by W c and W e on M 1 (R k ) are strictly stronger than the topology of convergence in distribution.

Distances between point processes
There are several ways to define a distance between point processes.We here focus on two of them.They are constructed similarly: Choose a cost function c on N E and then consider W c defined by the solution of MKP(µ, ν, c) for µ and ν two elements of M 1 (N E ).Definition 7. Consider dist TV the distance in total variation between two configurations (viewed as discrete measures): where ξ∆ζ is the symmetric difference between the two sets ξ and ζ, i.e. we count the number of distinct atoms between ξ and ζ.Then, for µ and ν belonging to M 1 (N E ), their Kantorovitch-Rubinstein distance is defined by is Lipschitz.Let (µ n , n ≥ 1) be a sequence of point processes and denote by ξ n an N E -valued random variable whose distribution is µ n .Similarly, for another element ν ∈ M 1 (N E ), let ζ be an N E -valued random variable whose distribution is ν.In view of (4) and Theorem 4, if W KR (µ n , ν) tends to zero then for any compact set Λ, the sequence of random variables (ξ n (Λ), n ≥ 1) converges in distribution to ζ(Λ).
For the quadratic distance, we first consider, on E = R k , the cost function as ρ(x, y) = 2 −1 x−y 2 and we define a cost between configurations (see also [4,3,2]) as the 'lifting' of ρ on N E : where Σ(ξ 1 , ξ 2 ) denotes the set of β ∈ N E×E having marginals ξ 1 and ξ 2 .First remark that when ξ 1 (E) is finite, the cost is finite only if ξ 1 (E) = ξ 2 (E), otherwise Σ(ξ 1 , ξ 2 ) is empty and then, by convention, the cost is infinite.Moreover, the cost is attained at the permutation of {1, • • • , ξ 1 (E)} which minimizes the sum of the squared distances: ξ1(E) j=1 x i − y σ(i) where For infinite configurations, it is not immediate that the cost function so defined has the minimum regularity required to consider an optimal transport problem.According to [23], this is indeed true as c is lower semi continuous on N E × N E .We can then consider the Monge-Kantorovitch problem MKP(µ, ν, c) on M 1 (N E ).The main theorem of [8] is the following.For Λ a compact subset of E, by definition of locally finite point process, the number of points of ξ |Λ is finite hence we can write where A probability measure µ on N E is said to be regular whenever it admits Janossy densities of any order.Theorem 6.Let Λ ⊂ E a compact set.Let µ be a regular probability measure on N Λ and ν be a probability measure on N E .The Monge-Kantorovitch distance, associated to c, between µ and ν is finite if and only if the following two conditions hold (1) Then, the solution of MKP(µ, ν, c) is attained at a unique point γ opt and there exists a unique map ϕ : where Id −∇ψ n is the optimal transportation map between the n-th Janossy normalized densities c −1 n j µ n and c −1 n j ν n .This means that whenever the distance between µ and ν is finite, there exists a strong coupling which works as follows: 1) draw a discrete random variable with the distribution of ξ(Λ), let ι the obtained value 2) draw the points of ξ according to µ ι and then 3) apply the map ϕ ι (., ξ) to each point of ξ.The configuration which is obtained is distributed according to ν ι .
It is shown in [8] that for two Poisson point processes of respective intensity σ 1 and σ 2 , the distance defined above is finite if and only if where t is the optimal transport map between σ 1 and σ 2 for the Euclidean cost as defined in Theorem 3. Note that the optimal map is a transformation which is applied to each atom irrespectively of the others.In full generality, for non Poisson processes, the amount by which an atom is moved depends on the other locations.
4.1.Distances between DPP.For determinantal point processes, we can evaluate the effect of a modification of the eigenvalues with the Kantorovitch-Rubinstein distance and the effect of a modification of the eigenvectors with the Wasserstein-2 distance.
Lemma 7. Let µ and ν two determinantal point processes with respective kernels K µ and K ν .Assume that K µ and K ν are two projection kernels in some Hilbert space L 2 (m) such that K µ = K ν + L where L is another projection kernel and L is orthogonal to K ν .Then, (6) W KR (µ, ν) ≤ range(L).
Proof.The hypothesis means that there exists (φ j , j = 1, Since L is a positive symmetric operator, this exactly means that K ν ≺ K µ in the Loewner sense.According to [15], there exists ξ , ζ of respective distribution µ, ν and a point process ω such that This implies that ξ ∆ζ (E) = ω (E) = l.
According to the first definition of W KR , see (4), this implies (6).
Theorem 8. Let µ (respectively ν) be a determinantal point process of characteristics K µ and h µ (respectively K ν and h ν ) on a compact set Λ ⊂ R k .Denote by (λ µ n , n ≥ 0) (respectively (λ ν n , n ≥ 0)) the eigenvalues of K µ in L 2 (E, h µ dx) (respectively of K ν in L 2 (E, h ν dx)) counted with multiplicity and ranked in decreasing order.Assume that λ ν n ≤ λ µ n , ∀n ≥ 0.Then, Proof.We make a coupling of (B(λ µ n ), n ≥ 0) and (B(λ µ n ), n ≥ 0) by using the same sequence of uniform random variables: Let (U n , n ≥ 0) be a sequence of independent, identically uniformly distributed over [0, 1], random variables, consider In view of the hypothesis, X ν ≤ X µ hence I ν ⊂ I µ .Otherwise stated, K Iµ and K Iν are two projection kernels which satisfy the hypothesis of Lemma 7. Hence, there exists a realization (ξ, ζ) of Σ(µ, ν) given I µ and I ν , such that Gluing these realizations together, we get a coupling (ξ, ζ) such that according to (8).Since the Kantorovitch-Rubinstein distance is obtained as the infimum over all couplings of the total variation distance between ξ and ζ, this particular construction shows that (7) holds.
The next corollary is an immediate consequence of the alternative definition of the KR distance on point processes, see Eqn. (4).Corollary 9.With the hypothesis of Theorem 8, let ξ and ζ be random point process of respective distribution µ and ν.Then, we have that This means that the Kantorovitch-Rubinstein distance between point processes focuses on the number of atoms in any compact.As we shall see now, the Wasserstein-2 distance evaluates the matching distance between configurations when they have the same cardinality.
Theorem 10.Let µ (respectively ν) be a determinantal point process of characteristics K µ and h µ (respectively K ν and h ν ) on a compact set Λ ⊂ R k .The Wasserstein-2 distance between µ and ν is finite if and only if Proof of Theorem 10.According to point 1 of Theorem 6, we must first prove the equality of the spectra.We already know that the eigenvalues of both kernels are between 0 and 1, with no other accumulation point than 0. Furthermore, the distribution of ζ(Λ) is that of the sum of independent Bernoulli random variables of parameters given by the eigenvalues, hence The infinite product is convergent since trace K µ = λ∈sp Kµ is finite.
If the Wasserstein-2 distance between µ and ν is finite then Φ µ = Φ ν .The zeros of these two holomorphic functions are all greater than 1 and are isolated.Let m(Φ, r) = number of zeros (counted with multiplicity) of Φ in B(0, r).

By the properties of zeros of holomorphic functions we have
and the two spectra must coincide.Now then, by the very definition of W e , Thus, we have This quantity is finite hence the Wasserstein-2 distance between µ and ν as soon as the spectra are equal.
The next lemma is a straightforward consequence of Lemma 2.
Lemma 11.Let µ be a determinantal point process of characteristics K µ and h µ .
For I a finite subset, let where the λ µ i 's are the eigenvalues of K µ .Then, its n-th Janossy density is given by This means that given ζ(E) = n, the points are distributed according to the probability measure: Proof.Consider that ξ is constructed with Algorithm 1 and denote by I ξ the set of indices of the Bernoulli random variables which are equal to 1 for the drawing of ξ.
For any bounded f : The result follows by identification with (1).
Then, Theorem 6 applies as follows.
Theorem 12. Suppose that the hypothesis of Theorem 10 hold.Let Id −∇ψ n be the optimal transport map between p µ n and p ν n .Then, the optimal coupling is given by the following rule: For ξ such that ξ(E) = n, it is coupled with ζ the configuration with n atoms described by Theorems 10 and 12 mean that two determinantal point processes are strongly coupled when and only when their eigenvalues are identical.Moreover, the eigenvalues also control the convex combination of the densities of the projection DPP which appear in the Janossy densities.4.2.Determinantal projection processes.Recall from Definition 6 that a projection DPP has a spectrum reduced to {1}.When it is of finite rank M , almost all its configurations have M points distributed according to the density (10) p Theorem 12 cannot be used as is since projection DPPs do not possess Janossy densities.However, the initial definition of W c can still be used.
Theorem 13.Let ψ = (ψ j , 1 ≤ j ≤ M ) and ψ = (ψ j , 1 ≤ j ≤ M ) two orthonormal families of L 2 (E; m).Let µ ψ and µ φ the two projection DPP associated to these families.Then, Proof.We know that the points of µ ψ (respectively µ φ ) are distributed according to p ψ (respectively p φ ) given by (10).Let γ be a probability measure on E M × E M whose marginals are p ψ dm and p φ dm.We know that We know from Algorithm 1, that the marginal distribution a single atom of µ ψ has distribution Since p ψ and p φ are both invariant with respect to permutations, we obtain Since we can order the elements of the family ψ and φ in any order, the result follows.

Simulation
As mentioned previously, implementing Algorithm 1 using rejection sampling involves too many rejections which prevents the algorithm to work for more than 1 000 points.In this section, we will show that we can generate 10 000 points for the Ginibre point process on a compact disc using inverse transform sampling and approximation of the kernel.
In this section, we will consider Ginibre point processes but our reasoning could be applied to any rotational invariant determinantal process like the polyanalytic ensembles [16,13] or the Bergman process [17].For these processes, it is relatively easy to compute the eigenvalues and the eigenfunctions of the kernel of their restriction to a ball centered at the origin.For the Ginibre process, which will be our toy model, its restriction to B R , denoted by G R , has a kernel of the form where [9] λ x n e −|x| 2 /2 , with γ(n, r) is the lower incomplete gamma function.We denote by the G R N the process whose kernel is the truncation of K R to its first N components: The strict application of Algorithm 1 for the simulation of G R , requires to compute all the quantities of the form to determine which Bernoulli random variables are active.Strictly speaking, this is unfeasible.However, it is a well known observation that the number of points Actually, the proof says that with high probability, G R and G R N R do coincide.
As a corollary of the previous proof, we have ( 11) for R large enough.This means that the number of active Bernoulli random variables in Algorithm 1 is less than (R+c) 2 with high probability.We can also provide a lower bound on the cardinality of I.
Lemma 15.For any R > c > 0, Proof.As in the previous proof, we will reduce the problem to bound a sum of reduced incomplete gamma functions.
Using the formula Γ(n + 1, x) = nΓ(n, x) + x n e −x , we have by induction: For n = (R − c) 2 and x = R 2 , this implies: The proof is thus complete.
The combination of Lemma 15 and (11) shows that the cardinality of I is of the order of R 2 with high probability.
5.1.Inverse transform sampling.The next step of the algorithm is to draw the points according to a density given by a determinant.Since we do not have explicit expression of the inverse cumulative function of these densities, we have to resort to rejection sampling.Fortunately, even it has not been noticed to the best of our knowledge, the particular form of the eigenfunctions of the Ginibre like processes is prone to the simulation of modulus and arguments by inverting their respective cumulative distribution function.This new approach is summarized in Algorithm 2.
Lemma 16 (Simulation of the modules).Let p(z) = i∈I a i z ni f i (|z|) and The following equality holds: where Given a sequence of complex numbers W for from 1 to |I|, we denote by e the orthonormal vectors obtained by Gram-Schmidt orthonormalization of the vectors φ R I (W ).Let also M ⊂ R |I| be the vector (|e ,i | 2 ) i∈I where e ,i is the coordinate of index i of e .Moreover, let U F (r) = (F i (r)) i∈I .Finally, let U i be the sequence of vectors defined by induction with U 1 = (1) i∈I and U i+1 = U i − M i .Then drawing the module of W i in Algorithm 1 is reduced to sampling uniformly c i in [0, 1] and solving the equation: Knowing U i , we can compute U i+1 in O(|I|) arithmetic operations.Using a dichotomy approach, Equation ( 12) can be solved with precision ε using O(|I| log ε) evaluation of the F i .
Given the moduli, we can now simulate the arguments.
Lemma 17 (Simulation of the arguments).Let p = i∈I a i z ni f i (|z|) and Then Q can be rewritten as a sum of |I| 2 terms: Similarly to the simulation of the modules, for from 1 to |I|, let A ⊂ C |I| 2 be the vector (e ,i e ,k ) i,k∈I .Let V G (r, α) = (G i,k (r, α)) i,k∈I .Let (V i ) i=1...|I|−1 be the sequence of vectors defined by recurrence with Drawing the argument of W i in Algorithm 1 is now reduced to sampling uniformly c i in [0, 1] and solving the equation: Computing V i+1 from V i requires O(|I| 2 ) arithmetic operations.Then, for fixed r, Equation ( 13) can be solved up to precision ε in O(|I| 2 +|I| log ε) arithmetic operations and evaluations of the f i , using a dichotomy approach.
The total cost of sampling the W i with this approach is O(|I| 3 + |I| 2 log ε) operations.We will see in the next section how we can reduce this complexity using an approximation of the eigenfunctions.
Gathering the results of this section, we get in Algorithm 2 an efficient method to sample points from a symmetric projection point process.

Data: R, I
Simulation of a compact symmetric projection point process restricted to the disc B R 5.2.Compact Ginibre and approximation.Using Theorem 13 with a wellchosen approximation, we will show that we can reduce in Algorithm 2 the complexity of steps A. and B. from O(|I| 2 ) to O(|I| 1.5 ) operations with high probability.
For a given constant c > 0 and for an integer n, let R n be the ring between the circles of radii u n = min(R, √ n + c) and l n = max(0, min( √ n, R) − c).Let and define the following approximated functions : . We now show that replacing φ R n by φR n does not cost much in terms of accuracy.• Then remark that : Thus, summing from R 2 to ∞ we get: The proof is thus complete.

Experimental results.
An implementation in Python of this algorithm publicly available [22] allowed us to sample 10 000 points in 2 128 seconds on a 8 core 3Ghz CPU.The same approach can be used for the DPP with the so-called Bergmann kernel which represents the zeros of some Gaussian analytic functions [17].

Figure 1 .
Figure 1.Peaks and valleys of some densities.