Integrated quantile functions: properties and applications

In this paper we provide a systematic exposition of basic properties of integrated distribution and quantile functions. We deﬁne these transforms in such a way that they characterize any probability distribution on the real line and are Fenchel conjugates of each other. We show that uniform integrability, weak convergence and tightness admit a convenient characterization in terms of integrated quantile functions. As an application we demonstrate how some basic results of the theory of comparison of binary statistical experiments can be deduced using integrated quantile functions. Finally, we extend the area of application of the Chacon–Walsh construction in the Skorokhod embedding problem


Introduction
Integrated distribution and quantile functions or simple transformations of them play an important role in probability theory, mathematical statistics, and their applications such as insurance, finance, economics etc. They frequently appear in the literature, often under different names. Moreover, in many occasions they are defined under additional assumption of integrability of a random variable or at least integrability of the positive or the negative part of a variable. Let us point out only few references. For a random variable X, let F X be the distribution function of X and q X any quantile function of X. Examples of integrated distribution functions or their simple modifications are: considered in [11].
• The Conditional Value at Risk CV@R X (u) := 1 u u 0 q X (s) ds, u ∈ (0, 1], see, e.g., [22,23], also called the Average Value at Risk [11], or the expected shortfall, or the expected tail loss. Again, these transforms characterize the distribution of X if either E[X − ] < ∞ or E[X + ] < ∞, otherwise, they are equal to +∞ or −∞ identically. The main goal of this paper is a systematic exposition of basic properties of integrated distribution and quantile functions. In particular, we define the integrated distribution and quantile functions for any random variable X in such a way that each one of these functions determines uniquely the distribution of X. Further, we show that such important notions of probability theory as uniform integrability, weak convergence and tightness can be characterized in terms of integrated quantile functions (see Section 3). In Section 4 we show how some basic results of the theory of comparison of binary statistical experiments can be deduced using our results in previous two sections. Finally, in Section 5 we extend the area of application of the Chacon-Walsh construction in the Skorokhod embedding problem with the help of integrated quantile functions.
One of the key points of our approach is that we define integrated distribution and quantile functions as Fenchel conjugates of each other. This is due to the fact that their derivatives, distribution functions and quantile functions, are generalized inverses (see, e. g., [8,11]). This convex duality result can be found in [21] and [11,Lemma A.26], and constitutes implicitly one of two main results in [22,23].
Let us note that we consider only univariate distributions in this paper. However, it is reasonable to mention a possible generalization to the multidimensional case based on ideas from optimal transport. The integrated quantile function of a random variable X, as it is defined in our paper, is a convex function whose gradient pushes forward the uniform distribution on (0, 1) into the distribution of X; moreover, the integrated distribution function is the Fenchel transform of the integrated quantile function and its gradient pushes forward the distribution of X into the uniform distribution on (0, 1) if the distribution of X is continuous. It the multidimensional case the existence of such functions follows from the McCann theorem [18]. Namely, if µ is the distribution on R d , then there exists a (unique up to an additive constant) convex function V whose gradient pushes forward the uniform distribution on the unit cube (or, say, the unit ball) in R d into µ. Additionally, if µ vanishes on Borel subsets of Hausdorff dimension d − 1, then the Fenchel transform V * of V pushes forward µ to the corresponding uniform distribution. We refer to [3,5,9] and [13] for recent advances in this area.
It is more convenient for us to speak about random variables rather than distributions. However, if a probability space is not specified, the symbols P and E for probability and expectation enter into consideration only via distributions of random variables and may refer to different probability spaces. This allows us to replace occasionally random variables by their distributions in the notation.
For the reader's convenience, we recall some terminology and elementary facts concerning convex functions of one real variable. A convex function f : is not empty. The subdifferential ∂f (x) of f at a point x is defined by If f is a proper convex function and x is an interior point of dom f , then ∂f ( are the left and right derivatives of f at x respectively. The conjugate of f , or the Fenchel transform, is the function f * on R defined by The conjugate function is lower semicontinuous and convex. The Fenchel-Moreau theorem says that if f is a proper lower semicontinuous convex function, then f is the conjugate of f * , i.e.
moreover, for x, u ∈ R, 2 Integrated distribution and quantile functions: definitions and main properties

Definition and properties of integrated distribution functions
The distribution function F X of a random variable X given on a probability space Since F X is bounded, for any choice of x 0 ∈ R, the integral x x0 F X (t) dt is defined and finite for all x ∈ R. 1 In contrast to this case, the function Ψ X in (1) corresponding to the choice Definition 1. The integrated distribution function of a random variable X is defined by Theorem 1. An integrated distribution function J X has the following properties: (ii) J X is convex, increasing and finite everywhere on R.
in particular, for x ∈ R, in particular, It is clear from (vi), that the integrated distribution function uniquely determines the distribution.
Proof. It is evident that (i) holds and J X is finite and increasing. For a < b, we have It follows that, for any x, y ∈ R, . Now the convexity of J X follows, which, in turn, implies (vi). Next, by Fubini's theorem, for Thus, we have proved (7). The second equality in (8) is trivial, and the first one follows from (7) if we put a = 0 or b = 0 depending on the sign of x. Let us prove (iv). The function ( ] as x → −∞ by the monotone convergence theorem. This proves the first equality in (iv). Similarly, Finally, (v) and (vii) follow from (10) and (8) respectively. Corollary 1. If X is an integrable random variable, then, for any x ∈ R, where Ψ X , H X , and U X are defined in (1)-(3), in particular, then there exists on some probability space a random variable X for which J X = J.
Proof. Since J is convex and finite everywhere on the line, it has the right-hand derivative at each point, and . Due to convexity of J, F is an increasing and right-continuous function. So we can conclude, that F is the distribution function of some random variable X and J X = J.

Definition and properties of integrated quantile functions
We call every function q X : (0, 1) → R satisfying a quantile function of a random variable X. The functions q L X and q R X defined by are called the lower (left) and upper (right) quantile functions of X. Of course, the lower and upper quantile functions of X are quantile functions of X, and we always have q L X (u) ≤ q X (u) ≤ q R X (u), u ∈ (0, 1), for any quantile function q X .
It follows directly from the definitions that, for any x ∈ R and u ∈ (0, 1), See, e. g., [8,11] for more information on quantile functions (generalized inverses).

Definition 2. The Fenchel transform of the integrated distribution function of a random variable
is called the integrated quantile function of X.
This definition is motivated by the fact mentioned in the introduction, that a function whose derivative is a quantile function must coincide with the Fenchel transform of J X up to an additive constant. The next theorem clarifies this point.
Theorem 3. An integrated quantile function K X has the following properties : (i) The function K X is convex and lower semicontinuous. It takes finite values on (0, 1) and equals +∞ outside [0, 1].
(ii) The Fenchel transform of K X is J X , i. e. for any x ∈ R, (iv) for every u ∈ [0, 1], where u 0 is any zero of K X .
(vi) The subdifferential of K X satisfies in particular, It is clear from (ii) and the similar remark after Theorem 1 that the integrated quantile function uniquely determines the distribution.
Proof. Since J X is a proper convex continuous function, it follows from the definition of K X and the Fenchel-Moreau theorem that K X is convex and lower semicontinuous, (14) holds, and for all x, u ∈ R where the last equivalence follows from (9). In particular, and, for u ∈ (0, 1), due to (11) and (12). Thus, we have proved (i), (ii) and (vi).  (14) and (17), we get inf u∈R K X (u) = 0 and this infimum is attained at u if and only if u ∈ [F X (0 − 0), F X (0)]. This constitutes assertion (iii). Now (iv) follows from preceding statements. Statement (v) follows from the definition of K X and Theorem 1 (iv): Finally, (vii) follows from the definition of K X and Theorem 1 (vii).

Corollary 2.
For any random variable X, Proof. Put u := F X (x) and g(y) := J X (y) − yu, y ∈ R. According to (9), ∂g(y) = [F X (y − 0) − u, F X (y) − u], in particular, 0 ∈ ∂g(y) if y = x. This means that the function g attains its minimum at x and, hence, Theorem 4. If a convex lower semicontinuous function K : and there is u 0 ∈ [0, 1] such that K(u 0 ) = 0, then there exists on some probability space a random variable X for which K X = K.
Proof. Under our assumptions where q(u) = K ′ − (u), u ∈ (0, 1), is increasing and left continuous. Let us define a probability space (Ω, F , P) as follows: Ω = (0, 1), F is the Borel σ-field and P is the Lebesgue measure. Put X(ω) := q(ω). Now if G(x) := inf{u ∈ (0, 1) : q(u) > x}, then it is easy to verify that q(u) ≤ x ⇔ G(x) ≥ u, cf. (11). It follows that G is the distribution function of X and, hence, q = q L X on (0, 1). This means that the lefthand derivative of K and K X coincide on (0, 1). In addition, their minimums over this interval are equal to zero. Therefore, K = K X on (0, 1) and, hence, everywhere on R.
Remark 1. An alternative way to prove Theorem 4 is to introduce the Fenchel transform J of K and to show that J satisfies the assumptions of Theorem 2. However, our proof yields not only a characterization statement of Theorem 4 but also an explicit representation of a random variable with a given integrated quantile function. Of course, this representation (namely, of a random variable with given distribution as its quantile function with respect to the Lebesgue measure on (0, 1)) is well known.
It is convenient to introduce shifted integrated quantile functions: Now we can express the functions defined in (4)-(6) in terms of shifted integrated quantile functions. If E[X − ] < ∞, then the absolute Lorenz curve coincides with K Since Ψ X is obtained from J X by adding a constant E[X − ] = K X (0) by Corollary 1, the absolute Lorenz curve is the Fenchel transform of Ψ X :

Convex orders
Let us recall the definitions of convex orders in the univariate case.
For an arbitrary function ψ : R → R + , define C ψ as the space of all continuous functions f : R → R such that Let X and Y be random variables. We say that The following theorem is well known. We provide its proof which reduces to the duality between integrated distribution and quantile functions.
Theorem 5. Let X and Y be random variables.
(i) If E|X| < ∞, E|Y | < ∞, then the following statements are equivalent: Proof. First, let us prove (ii). It is well known (see, e. g., [24] Taking (8) into account, the last condition can be rewritten as which in turn, is equivalent to by the definition of integrated quantile function. The claim follows. Now, (iii) follows from (ii) and the first part of the remark before Theorem 5. Now, the second part of this remark shows equivalence (a) ⇔ (d) in (i).

Examples
In this subsection we demonstrate how the developed techniques can be used to derive two elementary well-known inequalities, see [10, p. 152]. This approach allows us to find the distributions at which the corresponding extrema are attained. So the inequalities obtained in this way are sharp.
Example 1. Let X be a random variable with zero mean and finite variance D(X) = σ 2 . It is required to find a sharp upper bound for the probability P(X ≥ t), where t is a fixed positive number.
We solve a converse problem. Namely, let p := P(X ≥ t) be fixed. Our purpose is to find a sharp lower bound for variances D(X) = E[X 2 ] over all random variables X such that E[X] = 0, and P(X ≥ t) = p.
The above class of distributions has a minimal element with respect to the convex order. Indeed, let Y be a discrete random variable with P(Y = t) = p and P(Y = − tp If X is another random variable with these properties, then K Y (0) = 0 and the graph of K [1] Y consists of two straight segments, see Due to convexity of integrated quantile functions, this implies K Resolving this inequality with respect to p = P(X ≥ t), we obtain the required upper bound To show that the estimate in (18) is sharp it is enough to put p = σ 2 σ 2 +t 2 in the definition of a random variable Y and to check that E[Y 2 ] = σ 2 and, for X = Y , the equality holds in (18).
It is required to find a sharp lower bound for the probability P(X > a), where a ∈ (0, 1) is fixed.
We will proceed in the similar way as in the previous example. Namely, let p := P(X > a) be fixed. Our purpose is to find a sharp lower bound for the second moment E[X 2 ] over all random variables X such that F X (0) = 0, E[X] = 1 and P(X > a) = p.
The above class of distributions has a minimal element with respect to the convex order. Indeed, let Y be a random variable such that X is another random variable with these properties, then K Due to convexity of integrated quantile functions, this implies K Resolving this inequality with respect to p = P(X > a), we obtain the required lower bound The sharpness of the estimate in (19) follows if we put p = (1−a) 2 b−a(2−a) in the definition of a random variable Y and verify that E[Y 2 ] = b and, for X = Y , the equality holds in (19). Remark that replacing the right-hand side in (19) by a smaller quantity , we arrive at the inequality (7.6) in [10, p. 152].
3 Uniform integrability and weak convergence

Tightness and uniform integrability
In this subsection we study conditions for tightness and uniform integrability of a family of random variables in terms of integrated quantile function. It is a natural question because both tightness and uniform integrability are characterized in terms of one-dimensional distributions of these variables.
Theorem 6. Let (X α ) be a family of random variables. Then the following statements are equivalent: If α is such that u 0,α < c, then, by the three chord inequality, This proves the tightness of the laws of X α .
Since implications (iv) ⇒ (ii) and (iii) ⇒ (ii) are obvious, the claim follows. Proof. Let us consider the probability space (Ω, F , P) as in the proof of Theorem 4 and define random variables Y α (ω) = q L Xα (ω). Then X α d = Y α and it is enough to study the uniform integrability of the family {Y α }. Without loss of generality, we suppose that X α = Y α .
Let us recall that a family {X α } is uniformly integrable if and only if E|X α | are bounded and E[|X α |1 A ] are uniformly continuous, i. e. sup α E[|X α |1 A ] → 0 as P(A) → 0; moreover, the boundedness of E|X α | is a consequence of the uniform continuity if the measure P has no atomic part, in particular, in our case. On the other hand, by the Arzela-Ascoli theorem a set in C[0, 1] is relatively compact if and only if it is uniformly bounded and equicontinuous.
We shall check that the uniform boundedness and the equicontinuity of {K Xα } are equivalent to uniform boundedness of E|X α | and the uniform continuity of E[|X α |1 A ], respectively. In view of the above this is sufficient for the proof of the theorem.

Weak convergence
In this subsection (X n ) is a sequence of random variables. Theorem 9. The following statements are equivalent: (i) The sequence (X n ) weakly converges.
(ii) There is a sequence (c n ) of numbers such that, for every u ∈ (0, 1), the sequence (K Xn (u) − c n ) converges to a finite limit.
Moreover, in this case if X is a weak limit of (X n ), then K X (u) = lim n→∞ K Xn (u) for all u ∈ (0, 1).
for all n, then the pointwise convergence of K Xn (resp. K [1] Xn ) on (0, 1) is sufficient (use Theorem 9, (ii) ⇒ (i)) but not necessary for the weak convergence of X n .
Theorem 10. Let (X n ) weakly converge and E|X n | < ∞ (resp. E[X − n ] < ∞, resp. E[X + n ] < ∞). Then the following statements are equivalent: (i) The sequence (|X n |) (resp. (X − n ), resp. (X + n )) is uniformly integrable. Remark 3. In contrast to Remark 2, a combination of the weak convergence of X n and the uniform integrability of X − n (resp. X + n ) can be expressed in terms of the shifted integrated quantile functions K Xn (resp. K [1] Xn ). For instance, let a sequence (X n ) weakly converge to X and the sequence (X + n ) is uniformly integrable. Then the pointwise limit of K and is continuous on (0, 1]. Conversely, if the functions K [1] Xn converge pointwise to a continuous limit on (0, 1], then X n weakly converges, say, to X (use Theorem 9, (ii) ⇒ (i)). In particular, for any u ∈ (0, 1), Continuity of the limiting function in the left-hand side of the above formula at u = 1 implies lim n→∞ E[X + n ] = lim u↑1 K X (u) = E[X + ]. Proof of Theorems 9 and 10. First, let us suppose that (X n ) weakly converges to X. It is well known that then q L Xn (u) → q L X (u) as n → ∞ for every continuity point u of q L X . Put u n,0 := F Xn (0) and u 0 := F X (0). Assume for the moment that X n are uniformly bounded. Then, for any u ∈ [0, 1], by the dominated convergence theorem. Moreover, by Theorem 7 the sequence (K Xn ) is relatively compact in C[0, 1]. Combined with pointwise convergence, this shows that (K Xn ) converges to K X uniformly on [0, 1]. If no assumptions on X n are imposed, let us introduce the function g C (x) := max(min(x, C), −C), C > 0, and define random variables Then (Y n ) weakly converges to Y . Hence, K Yn → K Y uniformly on [0, 1] as it has just been proved. However, K Yn = K Xn on [F Xn (−C), F Xn (C)] ∋ u n,0 and which is possible by tightness. Therefore, K Xn (u) → K X (u) uniformly in u on [α, β] ⊆ (0, 1). In particular, (K Xn ) converges pointwise to K X on (0, 1).
To complete the proof of Theorem 9 it remains to prove implication (ii) ⇒ (i). Let u, v ∈ (0, 1). By the assumption, the sequence K Xn (u) − K Xn (v) converges to a finite limit and, hence, is bounded. By Theorem 6, the laws of X n are tight. Let (X n k ) be a weakly convergent subsequence. It follows from what has been proved that the integrated quantile function K(u) of its limit coincides with lim k→∞ K Xn k (u) for u ∈ (0, 1). Therefore, for all u ∈ (0, 1), This implies that c n k converges to a finite limit and that K(u) is obtained from lim n→∞ (K Xn (u) − c n ) by adding a constant. Since K is an integrated quantile function, this constant is determined uniquely. Thus, K is the same for all weakly convergent subsequences, which means that (X n ) weakly converges.
It is enough to prove Theorem 10 in one of three cases, for example, in the case , where X is a weak limit of (X n ). In other words, K Xn (0) → K X (0). Thus, we have (ii). Moreover, the sequence (K X − n ) is equicontinuous. It follows that (K Xn ) converges uniformly on every segment [0, β] ⊆ (0, 1). Implication (iii) ⇒ (ii) is trivial. If (ii) holds, then lim n→∞ K Xn (u) is a continuous function in u ∈ [0, 1). On the other hand, this limit is K X (u) for u ∈ (0, 1).
, and the sequence (X − n ) is uniformly integrable.

Applications to binary statistical models
The theory of statistical experiments deals with the problem of comparing the information in different experiments. The foundation of the theory of experiments was laid by Blackwell [1,2], who first studied a notion of being more informative for experiments. Since it is difficult to give an explicit definition of statistical information, the theory of statistical experiments evaluates the performance of an experiment in terms of the set of available risk functions, in general, for arbitrary decision spaces and loss functions. For the theory of statistical experiments we refer to [15,25], and especially to [27,28], where the reader can find unexplained results and additional information.
In this paper we consider only binary statistical experiments, or dichotomies, E = (Ω, F , P, P ′ ). It is known that for binary models, it is enough to deal with testing problems, i. e. with tests as decision rules and with the probabilities of errors of the first and the second kinds of a test. Let us introduce some notation. Q is any probability measure dominating P and P ′ , z := dP/dQ and z ′ := dP ′ /dQ are the corresponding Radon-Nikodým derivatives. E, E ′ , and E Q are the expectations with respect to P, P ′ and Q respectively. Note that P(z = 0) = 0 and Z := z ′ /z, where 0/0 = 0 by convention, is the Radon-Nikodým derivative of the P-absolutely continuous part of P ′ with respect to P.
For an experiment E = (Ω, F , P, P ′ ), denote by Φ(E) the set of all test functions ϕ in E, i.e. measurable mappings from (Ω, F ) to [0, 1]. It is convenient for us to interpret ϕ(ω) as the probability to accept the null hypothesis P and to reject the that is the smallest probability of the second kind error if the probability of the first kind error is u. It follows that the set N(E) and the risk function r E are connected by and In particular, r E is a continuous convex decreasing function taking values in [0, 1] and r E (1) = 0. Therefore, by Theorem 4, r E (u) coincides on [0, 1] with an integrated quantile function corresponding to some distribution. The following result determines this distribution and explains why it is natural to use integrated quantile functions for binary models.

Remark 4.
A usual way to prove that the set N(E) is closed is based on weak compactness of test functions, see, e.g., [16]. The reader may readily verify that the closedness of N(E) follows directly from the above proof.
Let us also introduce the minimum Bayes risk function (the error function) It can be expressed in terms of risk function r E and vice versa. Indeed, for any π ∈ (0, 1), In particular, it follows from Theorem 1 that see [28, p. 590]. Here we have used that J −Z (x) = x for x ≥ 0.
Finally, let us introduce one more characteristic of binary models, namely the distribution of the 'likelihood ratio' Now let us present some basic notions and results from the theory of comparison of dichotomies. All these facts are well known, see e. g. [27,Chapter 3] and [28,Chapter 10]. Our aim is to show how they can be deduced with the help of the results in Sections 2 and 3.
Definition 3. Let E = (Ω, F , P, P ′ ) and E = ( Ω, F , P, P ′ ) be two binary experiments. E is said to be more informative than E, denoted by The type of an experiment is the totality of all experiments which are equivalent to the given experiment.
Corollary 3. Let E and E be binary experiments. The following statements are equivalent:  (iii) The mapping E µ E is onto the set of all probability measures µ on (R + , B(R + )) such that x µ(dx) ≤ 1.
Proof. (i) Let r(u), u ∈ [0, 1], be a convex continuous decreasing function with r(1) = 0 and r(0) ≤ 1. Let Ω = [0, 1] and F be the Borel σ-field. Define P on (Ω, F ) as the Lebesgue measure and P ′ as the measure with the distribution function Then Z(u) = F ′ − (u) P-a.s. As in the proof of Theorem 4, it follows that r(1 − u) = K Z (u). Proposition 1 allows us to conclude that r E = r.
(iii) First, it is evident that µ E is a probability measure on (R + , B(R + )) such that x µ E (dx) ≤ 1 for any dichotomy E. Now, let µ be a probability measure on (R + , B(R + )) such that x µ(dx) ≤ 1. Put Ω = [0, +∞] and let F be the Borel σ-field. Define P as the probability measure which coincides with µ on Borel subsets of R + . Finally, define P ′ by If E is defined as E = (Ω, F , P, P ′ ), it is clear that µ E = µ.
(ii) First, it follows from the definition of the error function that 0 ≤ b E (π) ≤ π ∧ (1 − π), π ∈ [0, 1], and that b E is concave. If b is a function with these properties, then define J(x) := x − (1 + x)b( 1 1+x ), x ≥ 0, cf. (26); put also J(x) = 0 for x < 0. Using concavity of b, it is easy to check that J is convex on R + . Since b(0) = 0, we have lim x→+∞ J(x) x = 1. The inequalities 0 ≤ b(π) ≤ 1−π imply that 0 ≤ J(x) ≤ x for all x ≥ 0. In particular, J is convex on R and, by Theorem 2, J is the integrated distribution function of some nonnegative random variable Z. Finally, the inequality b(π) ≤ π implies that J(x) ≥ x − 1, which means that E[Z] ≤ 1 by Theorem 1. Hence, b is the error function of an experiment E such that µ E = Law(Z).
Let us note that the proofs of (i) and (iii) give more than it is stated. Starting with a function r or a measure µ from corresponding classes, we construct an experiment such that its risk function (resp., the distribution of the likelihood ratio) coincides with r (resp. µ). Now, if we start in (i) with the risk function r E of an experiment E, we obtain a new experiment, say, κ(E), equivalent to E. Moreover, experiments E 1 and E 2 are equivalent if and only if κ(E 1 ) = κ(E 2 ). In other words, the rule E κ(E) is a representation of binary experiments. Another representation is given in the proof of (iii).
It is easy to check that E E if and only if δ 2 (E, E) = 0. Hence, E ∼ E if and only if ∆ 2 (E, E) = 0. It is also easy to check that δ 2 and ∆ 2 satisfy the triangle inequality and, hence, ∆ 2 is a metric on the space of types of experiments. We shall see after the next proposition that this metric space is a compact space.
Proposition 4. Let E and E be binary experiments. The following statements are equivalent: Proof. (i) ⇔ (ii) follows immediately from Definition 4, so our goal is to prove (ii) ⇔ (iii) using dual relations (13) and (14). A direct proof of (i) ⇔ (iii) can be found in [27].
To simplify the notation, put K := K Z , J := J Z , while the corresponding functions in the experiment E are denoted by K and J. Since r E (u) = K(1 − u) (29) is equivalent to In turn, if follows from (26) that (30) is equivalent to Since J(x) = 0 for x ≤ 0, we have K(u) = sup x≥0 {xu − J(x)} and similarly for K. Thus, it follows from (32) that, for u ≥ 0, Conversely, let (31) hold true, and let x > 0 be such that F Z (x − 0) ≥ ε 2 , where Z is the Radon-Nikodým derivative of the P-absolutely continuous part of P ′ with respect to P. Then where the last equality follows from the fact that the supremum in (14) is attained at , cf. (17). It remains to note that if x is such that F Z (x− 0) < ε 2 , then J(x) ≤ εx 2 and (32) is obviously true.
As a consequence, we obtain the following expressions for δ 2 (E, E) and ∆ 2 (E, E), see ( [27], [28, p. 604 where L(·, ·) is the Lévy distance between distribution functions, F is defined as in (28) with r = r E , and F is defined similarly with r = r E .
The subset of concave functions b on [0, 1] satisfying 0 ≤ b(π) ≤ π ∧ (1 − π) is clearly closed with respect to uniform convergence and is equicontinuous. By the Arzela-Ascoli theorem, this subset is a compact in the space C[0, 1] with sup-norm. Therefore, the space of types of experiments is a compact metric space with ∆ 2metric.
Proposition 5. Let E and E n , n ≥ 1, be binary experiments. The following statements are equivalent: (ii) r E n converges to r E pointwise on (0, 1].
(iv) µ E n weakly converges to µ E .
Proof. The equivalences (i) ⇔ (ii) and (i) ⇔ (iii) follow from Corollary 4, and the equivalence of (ii), (ii ′ ), and (iv) is a consequence of Theorem 10 and Proposition 1. However, we prefer to give a direct proof of the equivalence (i) ⇔ (ii) without using the Lévy distance.
Assume (i). By (29), Passing to the limit as n → ∞, we get Combining these inequalities, we obtain lim n→∞ r E n (u) = r E (u) for 0 < u < 1.
Since risk functions vanish at 1, the convergence holds for u = 1 as well. Now the converse implication (ii) ⇒ (i) is proved by standard compactness arguments.

Chacon-Walsh revisited
The Skorokhod embedding problem was posed and solved by Skorokhod [26] in the following form: given a centered distribution µ with finite second moment, find a stopping time T such that E[T ] < ∞ and Law(B T ) = µ, where B = (B t ) t≥0 , B 0 = 0, is a standard Brownian motion. Chacon and Walsh [4] suggest to construct T as the limit of an increasing sequence of stopping times T n , each being the first exit time (after the previous one) of B from a compact interval. This construction has a simple graphical interpretation in terms of the potential functions of B Tn (we recall that potential functions are defined in (3)).
Cox [6] extends the Chacon-Walsh construction to a more general case. He considers a Brownian motion B = (B t ) t≥0 with a given integrable starting distribution µ 0 for B 0 and a general integrable target distribution µ. A solution T (such that Law(B T ) = µ) must be found in the class of minimal stopping times.
It is easy to observe that the Chacon-Walsh construction has a graphical interpretation in terms of integrated quantile functions as well; moreover, in our opinion, the picture is more simple. We give alternative proofs of the result in [4] and of some results in [6]. Moreover, we construct a minimal stopping time in some special case where µ 0 and µ may be non-integrable.
Let us recall the definition of the balayage. For a probability measure µ on R and an interval I = (a, b), −∞ < a < b < +∞, the balayage µ I of µ on I is defined as the measure which coincides with µ outside [a, b], vanishes on (a, b), and such that Since [a, b] µ I (dx) = [a, b] µ(dx) and [a, b] x µ I (dx) = [a, b] x µ(dx), the balayage µ I is a probability measure and has the same mean as µ (if defined). It follows that, if B = (B t ) t≥0 is a continuous local martingale with B, B ∞ = ∞ a.s. (e. g. a Brownian motion), µ is the distribution of B S , where S is a stopping time, and the stopping time T is defined by Fig. 6. Graphs of shifted integrated quantile functions K [1] X and K [1] Y : the distribution of Y is the balayage of the distribution of X then T < +∞ a. s. and the distribution of B T is the balayage µ I . Let X and Y be random variables with the distributions µ and µ I respectively. It is clear that Moreover, the second equality in (34) can be rewritten as This allows us to describe how to obtain the integrated quantile function of Y : pass the tangent lines with the slopes a and b to the graph of K X , replace the curve on this graph between points where the graph meets the lines by the corresponding segments of these lines. If the point of intersection of these lines lies below the horizontal axis, then shift the resulting graph vertically upwards so that this point will come on the horizontal axis.
, and the last step is not needed if we deal with shifted integrated quantile functions K [1] X and K [1] Y (resp. K In particular, K [1] Y (u) ≤ K [1] X (u) for all u ∈ [0, 1]. The next lemma is a key tool in our future construction. Lemma 2. Let X and Y be random variables such that E[X + ] < ∞, E[Y + ] < ∞, and K [1] Y (u) ≤ K [1] X (u) for all u ∈ [0, 1]. Fix v ∈ (0, 1). Then there is a random variable Z such that K Y (v), and the distribution of Z is a balayage of the distribution of X.
Proof. Without loss of generality, we may assume that K Y (v). Let us consider the following equation: The maximum of the left-hand side over X (v) and is greater than the right-hand side. Moreover, it is attained at x ∈ [q L X (v), q R X (v)]. Further, applying Theorem 1 (iv)-(v), we get Since the left-hand side of (36) is a concave function in x, the equation (36) has two solutions a < q L X (v) and b > q R X (v), i. e. F X (a) < v < F X (b − 0). Using Corollary 2, rewrite equation (36) in the form This equality for x = a (resp. x = b) says that the straight line with the slope a (resp. b) and passing through the point (v, K [1] Y (v)) meets the curve K [1] X at the point where the first coordinate is F X (a) (resp. F X (b)). Due to (17), these straight lines are tangent lines to the curve K [1] X . Comparing with Lemma 1, we obtain that a random variable Z such that its distribution is the balayage of the distribution of X on I = (a, b) satisfies all the requirements.
From now on, we assume that there is a probability space with filtration (Ω, F , (F t ) t≥0 , P) and an (F t , P)-Brownian motion B = (B t ) t≥0 with an arbitrary initial distribution. For c > 0, let The next lemma is inspired by Theorem 5 in [7]. Let us also recall that T is a minimal stopping time if any stopping time R ≤ T with Law(B R ) = Law(B T ) satisfies R = T a. s. Theorem 11. Let µ 0 and µ be distributions on R such that R x + µ(dx) < ∞ and R (x − y) + µ 0 (dx) ≤ R (x − y) + µ(dx) for all y ∈ R.
Let B be a Brownian motion with the initial distribution Law(B 0 ) = µ 0 . Then there is an increasing sequence of stopping times 0 = T 0 ≤ T 1 ≤ · · · ≤ T n ≤ . . . such that T := lim n→∞ T n is a minimal a. s. finite stopping time, the distribution of B Tn is a balayage of the distribution of B Tn−1 for each n = 1, 2, . . . , and Law(B T ) = µ.
Then B Tn has the same distribution as X n . Since If P(T = ∞) = δ > 0, then the limit of the expression on the left in the last inequality is greater than or is equal to cδ, which is greater than the right-hand side if c is large enough. This contradiction proves that T < ∞ a.s. This implies that B Tn converges a.s. to B T and, hence, Law(B T ) = µ. It remains to prove that T is a minimal stopping time. According to Theorem 4.1 in [12], it is enough to find a one-to-one function G such that G(B) T is a closed submartingale.
Let g(x), x ∈ R, be a continuously differentiable function with the following properties: g ≡ 1 on [0, +∞) and is strictly positive and increasing on (−∞, 0], 0 −∞ g(x) dx < ∞, and g ′ (x) ≤ 1 for all x. Put G(y) := y 0 g(x) dx, then, in particular, G is strictly increasing and bounded from below, and G(y) = y for y ≥ 0. By Itô's formula, where G(B 0 ) is an integrable random variable, M is a local martingale and [M, M ] t = t 0 g 2 (B s ) ds ≤ t. Hence, by the Burkholder-Davis-Gundy inequality, E sup s≤t |M s | is integrable; in particular, M is a martingale. Finally, A is an increasing process and A t ≤ t/2. Therefore, G(B) is a submartingale and, hence, so are the stopped processes G(B) T and G(B) Tn . Note that, by construction, the process (B − B 0 ) Tn is bounded (by the sum of the lengths of I k , k ≤ n) for a fixed n. Hence, G(B t∧Tn ) ≤ B + t∧Tn ≤ B + 0 + sup s≤Tn |B s − B 0 |. We conclude that the submartingale G(B) Tn is uniformly integrable, hence, G(B t∧Tn ) ≤ icx G(B Tn ) for any n and t. On the other hand, B + Tn ≤ icx B + T . Combining, we get [G(B t∧Tn )] + ≤ icx B + T for any n and t. We can pass to the limit as n → ∞ in this inequality, which shows that the family [G(B t∧T )] + , t ∈ R + , is uniformly integrable. The claim follows.
Remark 5. It has been already mentioned in the proof of Theorem 5 that the assumptions on µ 0 and µ in Theorem 11 are equivalent to µ 0 ≤ icx µ. Remark 6. Let µ 0 and µ satisfy the assumptions of Theorem 11. As a by-product, we have obtained the following classical characterization of increasing convex order: there exist random variables X 0 and X defined on the same probability space such that Law(X 0 ) = µ 0 , Law(X) = µ and Indeed, take X 0 = B 0 and X = B T and use the uniform integrability of (B + Tn ) n≥1 to obtain the desired inequality.