Average precision at cutoff k under random rankings: expectation and variance

Manzhos, Tetiana; Ianevych, Tetiana; Melnyk, Olga

doi:10.15559/26-VMSTA298

Volume 13, Issue 3 (2026), pp. 357–374

Tetiana Manzhos

Tetiana Ianevych

Olga Melnyk

https://doi.org/10.15559/26-VMSTA298

Pub. online: 2 April 2026 Type: Research Article

Open Access

Received
2 November 2025

Accepted
28 February 2026

Published
2 April 2026

Notes

This paper is dedicated to the 85th anniversary of the birth of Prof. Yuriy Kozachenko.

Abstract

Recommender systems and information retrieval platforms rely on ranking algorithms to present the most relevant items to users, thereby improving engagement and satisfaction. Assessing the quality of these rankings requires reliable evaluation metrics. Among them, Mean Average Precision at cutoff k (MAP@k) is widely used, as it accounts for both the relevance of items and their positions in the list for some groups of users.

It seems obvious that intelligent ranking algorithms should outperform recommendations generated at random. But how can we measure how much better they work? In this article, we have established the expected value and variance of the average accuracy at k (AP@k), as they can be used as a foundation for efficiency criteria for MAP@k. Here, we considered two widely used evaluation models: offline and online, together with corresponding randomization models for them, and calculated the expected value and variance of AP@k in both cases. The numerical study for different scenarios was also performed.

1 Introduction

Recommendation systems play a pivotal role in modern digital environments, designed to predict and suggest items such as movies on streaming platforms, products on e-commerce sites, or news articles on media platforms based on users’ preferences and historical interactions. These systems leverage sophisticated algorithms that analyze user behavior, item attributes, and contextual data to provide personalized recommendations, thereby enhancing user satisfaction and engagement.

Ranking algorithms are fundamental to recommendation systems, as they determine the order in which items are presented to users. These algorithms prioritize relevant items and arrange them in a sequence designed to maximize user engagement and satisfaction.

Metrics that assess the effectiveness of these rankings are essential for evaluating algorithmic performance. Among various evaluation metrics, Average Precision at k (AP@k) plays a crucial role in recommendation systems due to its ability to assess the precision of recommended items up to a specified rank k. It not only accounts for the relevance of the items but also their position in the recommendation list, providing a comprehensive evaluation of ranking effectiveness. A review of this and other relevant metrics is provided in [7, 9], offering a deeper exploration of the evaluation methods.

AP@k is particularly important for evaluations of algorithms developed for scenarios where users are more likely to interact with top-ranked items, like book propositions on Amazon, short videos on YouTube, or news on Google News. While AP@k evaluates the ranking quality for individual users, MAP@k provides an overall measure of the algorithm’s performance across the entire user base, giving a more generalized assessment of the system’s ability to rank relevant items consistently.

There are a variety of different ranking methods, but none of them is the best. The effectiveness of a ranking algorithm depends on the particular problem, the data available, what rank must represent, and what properties are needed. The most well-known so far is the PageRank method used in search engines like Google (for details, see [4, 5] and the references therein).

In this paper, we calculated the closed-form expressions for the expected value and the variance of AP@k under random rankings. Since MAP@k averages user-level AP@k, the expectation of AP@k specifies the random baseline for the metric MAP@k, whereas the variance determines the scale of random fluctuations around that baseline. For a good ranking algorithm, MAP@k is expected to be significantly greater than the expectation of AP@k. This baseline supports principled benchmarking and guides the optimization of recommendation strategies. Together, these basic probabilistic parameters can be used for testing hypotheses on whether the algorithm is substantially better than just random ranking.

The paper is organized as follows. Section 2 introduces two evaluation settings that we consider (offline and online) in connection with corresponding randomization models and explains their practical relevance. Section 3 provides formal definitions of AP@k and MAP@k and clarifies their relationship. Sections 4 and 5 present closed-form expressions for the expectation and variance of AP@k in both offline and online scenarios. Section 6 presents some numerical examples and simulation results and comparative tables that illustrate the particularities of AP@k in several scenarios, whereas Section 7 summarizes the main findings and suggests directions for future research. The proofs of the theorems from the Sections 4 and 5 are placed in the Appendix.

2 Offline and online evaluation

In the evaluation of recommendation systems, two principal approaches are widely used: offline evaluation and online one. A comprehensive discussion of these methods can be found in [1, 6]. In both these approaches, we restrict ourselves to treating item relevance as binary — classifying items as relevant or not — but they differ in the methodology.

It is important to note that the notion of relevance itself is not universal and is typically defined by the underlying business objective of the system. Depending on the task, relevance may correspond to different user actions, such as clicking on an item, visiting a product page, completing a purchase, or consuming a certain fraction of the content. For instance, in a recommendation system optimized for traffic generation, a click or page visit may be treated as a relevant outcome, whereas in a revenue-oriented setting, relevance may be defined by a completed transaction. Consequently, while AP@k and MAP@k assume a binary relevance indicator, the definition of this indicator is inherently task-dependent and reflects the specific optimization goal of the application.

Offline evaluation relies on historical data, where the relevance of items for each user is known. It is particularly useful when comprehensive, labeled data, such as historical interactions, is available, making it possible to test recommendation algorithms in a controlled environment. In this case, we can design a training set and a test set according to our preferences. The common practice is to group the users who have the same number of relevant items in the list in order to make them homogeneous in some sense and comparable. We can explore several groups with a different number m of relative items in the list.

What if we do not use the advanced ranking algorithms but propose the items to the user randomly? And what does it mean by ‘randomly’? What will the expected value of AP@k be in this case? First of all, we need to define the random experiment that can be utilized under such a scenario. We can consider the experiment in which one random result differs from another only at the places where m relevant elements (no matter which) are located among N available places. It is stochastically equivalent to sampling m items from N with equal probabilities and without replacement (SWOR).

The offline evaluation is easier and reproducible. But it does not always predict online behavior and trends. Therefore, the offline evaluation is usually followed by the online evaluation. Online evaluation attempts to evaluate recommender systems by a method called A/B testing, where a part of users are served by recommender system A and another part of users by recommender system B in real-time mode, and then compare them with each other. We refer the interested reader to [1, 6] for more details on this issue. In A/B testing, one of the systems (i.e., B) can propose the items randomly. With time, all the recommender algorithms degrade, and the level of their degradation is regularly monitored, but instead of conducting an online test with random sorting of recommendations, which is quite costly and leads to reduced user satisfaction, it can be simulated using the randomization model.

We propose a model that also treats item relevance as binary. But it is unreasonable to require a predefined number of relevant items m within a set of N items per user, as was in the offline evaluation framework, since the model will be unrealistic. Instead, it is important to ensure that every participant in the sample testing the recommender system is shown at least k items from the recommendation list. For reliable evaluation in this scenario, two key conditions must be met: the pool of items used to generate the recommendation list must be sufficiently large, and the proportion of relevant items should remain approximately stable across all users in the treated group. In practice, it means that we need to choose users with similar characteristics, like gender or age. This comparison is facilitated by the probability p, which represents the likelihood of each item being relevant. The probability p can be estimated from the historical data. Given this probability, it is possible to calculate the expected value and the variance of the evaluation metric AP@k under random item ranking.

One possible random experiment that can be considered to fit this setting is Bernoulli sampling. It assumes that we have N items, and every item can be either relevant or not, independently of each other, with a probability of being relevant equal to p. This is equivalent to sampling from N items, where $m=pN$ (m is assumed to be natural) with equal probabilities and with replacement (SWR). The number of relevant items for each user under this model is not fixed, but on average it is equal to $Np=m$.

3 Average Precision at k and Mean Average Precision at k

Having established the context of offline and online evaluation, we now turn to the formal definition of the metrics Average Precision at k (AP@k) and Mean Average Precision at k (MAP@k), which are central to this paper. To begin, we first define AP@k for a single user, which serves as the building block for MAP@k.

The Average Precision at k measures the ranking quality of relevant items proposed to one user by averaging precision values at positions where relevant items appear. Formally, it is defined as:

(1)

\[ AP@k=\frac{1}{k}{\sum \limits_{i=1}^{k}}P@i\cdot {I_{i}},\]

where:

• k is the length of the recommendation list;
• $P@i=\frac{{\textstyle\sum _{j=1}^{i}}{I_{j}}}{i}$ is the precision at position i;
• ${I_{i}}\in \{0,1\}$ is a binary indicator, equal to 1 if the item at position i is relevant and 0 otherwise.

In this general form, the denominator k normalizes the metric over the entire recommendation list shown to the user. However, when the total number of relevant items m is smaller than k, a modified normalization factor, $\min (m,k)$, should be used:

(2)

\[ AP@k=\frac{1}{\min (m,k)}{\sum \limits_{i=1}^{k}}P@i\cdot {I_{i}}.\]

This adjustment ensures that the metric appropriately handles situations where there are fewer relevant items than the length of the recommendation list, avoiding artificially deflated scores. Without this modification, even in the best scenario – where all m relevant items are ranked in the top positions – the AP@k score would never reach 1, because the denominator would be larger than m, preventing the score from reaching its maximum value.

As it was mentioned before, MAP@k aggregates the AP@k values across all users in the dataset and provides a more comprehensive evaluation of the recommendation system, thereby reflecting the system’s overall performance of ranking. Formally, MAP@k is computed as:

(3)

\[ MAP@k=\frac{1}{|U|}\sum \limits_{u\in U}AP@{k_{u}},\]

where:

• U is the set of all users in the dataset;
• $AP@{k_{u}}$ is the $AP@k$ score for user u.

Turning to the interpretation of the MAP@k metric, a high MAP@k value indicates that most users encounter relevant items ranked highly, reflecting an effective recommendation system. Conversely, a low MAP@k suggests that relevant items appear lower, signaling a less efficient algorithm. The metric ranges from 0 to 1, where 1 means all relevant items are within the top-k positions, and 0 means none are included. However, for a comprehensive evaluation, it is essential to compare the MAP@k score with the expected value derived from random recommendation generation. While one might assume that this expected value equals the proportion of relevant items in the collection, this is not the case. As demonstrated by Yves Bestgen in his work [2], the expected average precision for random recommendations is more complex and varies depending on various factors such as the number of relevant items in the list and other characteristics. In the current study, we derive precise formulas for the calculation of random baselines for both random ranking algorithms described above (sampling without and with replacement).

4 Expectation and variance of AP@k under sampling without replacement for the case of offline ranking evaluation

If we want to determine how much the algorithm outperforms random chance, we can compare MAP@k calculated for the developed ranking algorithm with the expected value of AP@k if randomness is imposed by sampling without replacement. The expected AP@k alone is not sufficient for a comprehensive evaluation, but it provides the crucial baseline against which meaningful improvements can be interpreted. In addition, the variance quantifies the spread of AP@k values around this baseline, completing the picture by indicating the scale of random fluctuations.

We begin with the offline evaluation setting described in Section 2, corresponding to sampling without replacement. In this case, among N items, exactly m are relevant, and one random outcome differs from another only by the positions of the m relevant items in the ranking. Equivalently, the random ranking can be viewed as a uniformly random permutation of all N items, so that the m relevant items are placed at random positions. From the mathematical point of view, this means that for each item $j=1,2,\dots ,N$ we define a binary random variable ${I_{j}}$, with ${I_{j}}=1$ if the item at position j is relevant and ${I_{j}}=0$ otherwise. Furthermore, ${\textstyle\sum _{j=1}^{N}}{I_{j}}=m$ and the random variables ${I_{1}},{I_{2}},\dots ,{I_{N}}$ are not independent.

Under this model, the probability that the j-th item is relevant can be calculated as

\[ P({I_{j}}=1)=\frac{{C_{N-1}^{m-1}}}{{C_{N}^{m}}}=\frac{m}{N}.\]

So, $E({I_{j}})=m/N$ for every j and the random variable ${\textstyle\sum _{j=1}^{i}}{I_{j}}$, which counts the number of relevant items among the first i items, follows the Hypergeometric distribution with parameters N, m and i. Moreover

\[ \operatorname{E}(P@i)=\frac{1}{i}{\sum \limits_{j=1}^{i}}\operatorname{E}({I_{j}})=\frac{m}{N},\hspace{1em}i=1,\dots ,k.\]

In this case, we should compute $AP@k$ according to formula (2), where the normalization is made by $\min (m,k)$.

Hereafter, for the identification that the expectations are calculated with respect to different random models, we used different subscripts (SWOR and SWR) in the theorems’ formulation. In the proofs placed in Appendix A, we omit these subscripts to simplify the formulas.

The explicit expressions for the expected value $\operatorname{E}(AP@k)$ and the variance $\operatorname{Var}(AP@k)$ under the random model corresponding to sampling without replacement (SWOR) are established in the following theorem.

Theorem 1.

Assume that among N available items, exactly m are relevant for each user and their places are chosen randomly with equal probability by sampling without replacement. Then the expectation and the variance of $AP@k$ with respect to all possible random results are equal to

(4)

\[\begin{array}{l}\displaystyle {\operatorname{E}_{SWOR}}(AP@k)=\frac{m}{N\cdot \min (k,m)}\left(\frac{m-1}{N-1}k+\frac{N-m}{N-1}{H_{k}}\right),\\ {} \displaystyle \begin{aligned}{}{\operatorname{Var}_{SWOR}}\big(AP@k\big)=& \frac{1}{\min {(k,m)^{2}}}\frac{m}{N}\Bigg[k(C+2(E-F)+(k-1)G)+\\ {} & +{H_{k}}(B-2(E-kF))+{H_{k}^{2}}\cdot D+{H_{k}^{\left(2\right)}}(A-D)\Bigg],\end{aligned}\end{array}\]

where

\[\begin{aligned}{}A& =1-\frac{m}{N}-\frac{m-1}{N-1}\left(3-2\frac{m-2}{N-2}-\frac{m}{N}\left(2-\frac{m-1}{N-1}\right)\right);\\ {} B& =\frac{m-1}{N-1}\left(3\left(1-\frac{m-2}{N-2}\right)-2\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right);\\ {} C& =\frac{m-1}{N-1}\left(\frac{m-2}{N-2}-\frac{m(m-1)}{N(N-1)}\right);\\ {} D& =\frac{m-1}{N-1}\left(2-5\frac{m-2}{N-2}+3\frac{(m-2)(m-3)}{(N-2)(N-3)}\right)-\frac{m}{N}{\left(1-\frac{m-1}{N-1}\right)^{2}};\\ {} E& =\frac{m-1}{N-1}\left(3\frac{m-2}{N-2}\left(1-\frac{m-3}{N-3}\right)-\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right);\\ {} F& =\frac{m-1}{N-1}\left(\frac{m-2}{N-2}\left(1-\frac{m-3}{N-3}\right)-\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right);\\ {} G& =\frac{m-1}{N-1}\left(\frac{(m-2)(m-3)}{(N-2)(N-3)}-\frac{m}{N}\frac{m-1}{N-1}\right);\\ {} {H_{k}}& ={\sum \limits_{i=1}^{k}}\frac{1}{i};\hspace{1em}{H_{k}^{(2)}}={\sum \limits_{i=1}^{k}}\frac{1}{{i^{2}}}.\end{aligned}\]

Remark 1.

The values ${H_{k}}={\textstyle\sum _{i=1}^{k}}\frac{1}{i}$ are, actually, the partial sums of the harmonic series. They were named ‘harmonic’ numbers in 1968 by Donald Knuth [8]. For quite big k, the well-known formula (see [3])

\[ {H_{k}}=\ln k+\gamma +\frac{1}{2k}-{\varepsilon _{k}},\]

can be used. Here $\gamma \approx 0.5772$ is the Euler–Mascheroni constant and $0\le {\varepsilon _{k}}\le 1/8{k^{2}}$ which tends to 0 as $k\to \infty $.

Remark 2.

For small k, the values ${H_{k}}$ and ${H_{k}^{(2)}}$ can be calculated exactly by ordinary summation.

5 Expectation and variance of AP@k under sampling with replacement in case of the online ranking evaluation

We now turn to the online evaluation setting described in Section 2. Here, we assume that each item is relevant to a user with a constant probability p. The randomness can be modeled as a sequence of independent Bernoulli trials, or equivalently, as sampling with replacement (SWR), where each item has probability p of being sampled. In this case, the decision whether a recommendation is relevant is assumed to be independent for each item. Moreover, we assume a uniform relevance probability p for all recommendations and all users. These probabilistic assumptions differ from the offline case and lead to distinct formulas for the expectation and variance of AP@k.

From a mathematical point of view, this means that for each recommendation $j=1,2,\dots ,N$, we define the random variables ${I_{j}}=1$ if it is relevant and ${I_{j}}=0$ otherwise, as in the previous section. But here the random variables ${I_{1}},{I_{2}},\dots ,{I_{N}}$ are independent with mean $\operatorname{E}({I_{j}})=p$ and variance $\operatorname{Var}({I_{j}})=p(1-p)$, where p is assumed to be known. Of course, this assumption is quite strong and can be applied only when the users are relatively similar (homogeneous), as described in Section 2 for the online evaluation scenario.

Under these assumptions and notations, we have the following:

\[ P@i=\frac{1}{i}{\sum \limits_{j=1}^{i}}{I_{j}},\hspace{1em}\operatorname{E}(P@i)=\frac{1}{i}{\sum \limits_{j=1}^{i}}\operatorname{E}({I_{j}})=p,\hspace{1em}\text{and}\hspace{1em}AP@k=\frac{1}{k}{\sum \limits_{i=1}^{k}}P@i\cdot {I_{i}}.\]

The basic benchmarking characteristics for $MAP@k$ under random recommendations corresponding to the sampling with replacement are presented in the following theorem.

Theorem 2.

If, for each user, every item is assumed to be independently relevant with constant probability p and the places for the relevant items in the list of N items are chosen using sampling with replacement with probability p, then the expectation and the variance of $AP@k$ are equal to

(5)

\[\begin{aligned}{}{\operatorname{E}_{SWR}}(AP@k)& =p\left(p+\left(1-p\right)\frac{1}{k}{H_{k}}\right);\\ {} {\operatorname{Var}_{SWR}}(AP@k)& =\frac{5}{k}\cdot {p^{3}}(1-p)\\ {} & +\frac{1}{{k^{2}}}\cdot p(1-p)\left[p(1-2p)(3{H_{k}}+{H_{k}^{2}})+(1-p)(1-3p){H_{k}^{\left(2\right)}}\right].\end{aligned}\]

Here ${H_{k}}={\textstyle\sum _{i=1}^{k}}\frac{1}{i}$ is the k-th harmonic number and ${H_{k}^{\left(2\right)}}={\textstyle\sum _{i=1}^{k}}\frac{1}{{i^{2}}}$ is the k-th partial sum of the 2nd order harmonic series as well as in Theorem 1.

6 Numerical examples and simulations results

Before presenting some practical results, it is helpful to summarize the key differences in the formulas for the offline and online settings. In the offline case, the distribution of AP@k depends on the total number of items N, the number of relevant items m, and the cut-off k that specifies how many top-ranked positions are taken into account. This reflects the fact that evaluation is performed on a finite set of items, where exactly m out of N are relevant and their positions in the ranking determine the outcome. In the online setting, by contrast, the parameter N does not appear in the formulas, since evaluation is restricted to the top-k recommendations and each of them is modeled as an independent Bernoulli trial with relevance probability p. This makes the outcome depend only on k and p, regardless of the overall size of the candidate pool, which in practice may be very large. Conceptually, the probability p plays a role analogous to the ratio $m/N$ in the offline case: while offline evaluation fixes the proportion of relevant items deterministically, online evaluation assumes it probabilistically.

To cover a representative range of conditions, $N=50$ is fixed, and several choices of m and k are examined. The half-density case ($m=25$) with varying cut-offs illustrates the situations $m\gt k$, $m=k$, and $m\lt k$. Lower prevalences ($m=10$ and $m=2$) are also included along with a high-prevalence setting ($m=35$, corresponding to $p\gt 0.5$). These scenarios span the spectrum from very sparse to very dense relevance distributions, thereby testing the formulas under diverse conditions. The selected configurations are summarized in Table 1.

Table 1.

Examined scenarios

Scenario	Offline: $(N,m)$	Online: p	k	Characteristic
A1	(50, 25)	0.50	5	$m\gt k$, balanced case (50% relevant)
A2	(50, 25)	0.50	25	$m=k$, balanced case (50% relevant)
A3	(50, 25)	0.50	40	$m\lt k$, balanced case (50% relevant)
B	(50, 10)	0.20	20	moderate proportion of relevant items
C	(50, 2)	0.04	20	very low proportion of relevant items
D	(50, 35)	0.70	20	high proportion of relevant items $(p\gt 0.5)$

Table 2 reports the theoretical (derived from the formulas in Sections 4 and 5) means and variances of AP@k for the selected scenarios. The offline and online settings are placed side by side, allowing examination of their behavior under comparable conditions and observation of where the two randomization schemes produce similar outcomes and where differences emerge.

Table 2.

Values of expectation and variance for AP@k for scenarios from Table 1

Scenario	SWOR Expectation	SWR Expectation	SWOR Variance	SWR Variance
A1	0.36139	0.36416	0.05464	0.05884
A2	0.28387	0.28816	0.00735	0.01234
A3	0.43550	0.27674	0.00699	0.00775
B	0.13221	0.06878	0.00786	0.00294
C	0.07865	0.00851	0.01563	0.00023
D	0.52426	0.52778	0.01502	0.02195

As seen in Table 2, the comparison between offline (SWOR) and online (SWR) shows no uniform pattern: in some scenarios the two models behave very similarly, while in others their outcomes diverge noticeably. The observed differences reflect the $\min (m,k)$ normalization in offline evaluation and the independence structure of online sampling. Together, these results provide a clear basis for interpreting MAP@k under the random rankings.

Overlapping density plots of AP@k for Scenario A3 showing that the blue distribution for the SWOR model is centered higher (0.436) than the orange distribution for the SWR model (0.277), x-axis 0-1, y-axis 0-5.

Fig. 1.

Histogram of AP@k values for Scenario A3 $(N=50,m=25,p=0.5,k=40)$. The SWOR distribution (blue) is shifted to higher values compared to SWR (orange), reflecting normalization by $\min (m,k)$ in the offline case

In addition to the values reported in Table 2, Figures 1 and 2 show the simulated distributions of AP@k for two representative scenarios. These examples illustrate how SWOR and SWR can differ not only in their means but also in their variability.

Overlapping density plots comparing AP@k distributions for Scenario C, where the orange distribution for the SWR model shows higher density at lower AP@k values than the blue distribution for the corresponding SWR model, x-axis 0-1, y-axis 0-18.

Fig. 2.

Histogram of AP@k values for Scenario C $(N=50,m=2,p=0.04,k=20)$. The SWR distribution (orange) is tightly concentrated around zero, while the SWOR distribution (blue) shows wider variability due to the finite number of relevant items

In Scenario A3 ($m\lt k$), the distributions are shifted relative to each other: the offline case yields systematically higher values due to normalization by $\min (m,k)$, while the online case, which models relevance probabilistically, results in lower means.

In Scenario C (very low prevalence), the contrast lies in variability. The SWR distribution is sharply concentrated, reflecting that almost always no relevant items appear in the top-k, whereas the SWOR distribution remains wider because the fixed number of relevant items can still occupy different positions in the ranking.

7 Conclusions and discussion

The results obtained clarify what levels of AP@k are expected under random ranking and how much variability surrounds them. From a practical standpoint, these findings offer two key insights. First, the expected value of AP@k serves as a clear reference point against which observed performance can be compared. Second, the variance quantifies the random fluctuations around this reference, helping to distinguish genuine improvements from chance outcomes. Together, expectation and variance can be used to build the statistical test to check the hypothesis on whether the studied new ranking algorithm is significantly better than a random one or not by comparing the empirical MAP@k obtained from offline or online evaluation procedure with $\operatorname{E}(AP@k)$. But this task is left for future research. It will also be interesting to extend this analysis to other related metrics and investigate how statistical testing frameworks can build on these results for a robust comparison of ranking systems.

A Appendix. Proofs of Theorem 1 and 2

In the proofs, we need some technical results, which we formulate as a lemma.

Lemma 1.

For any $k\ge 1$

(i) ${\textstyle\sum _{i=1}^{k-1}}{\textstyle\sum _{l=i+1}^{k}}\frac{1}{i}=k({H_{k}}-1)$;
(ii) ${\textstyle\sum _{i=1}^{k-1}}{\textstyle\sum _{l=i+1}^{k}}\frac{1}{l}=k-{\textstyle\sum _{l=1}^{k}}\frac{1}{l}=k-{H_{k}}$;
(iii) ${\textstyle\sum _{i=1}^{k-1}}{\textstyle\sum _{l=i+1}^{k}}\frac{1}{i\cdot l}=\frac{1}{2}\bigg({H_{k}^{2}}-{H_{k}^{\left(2\right)}}\bigg)$,

where ${H_{k}}={\textstyle\sum _{i=1}^{k}}\frac{1}{i}$ is a standard harmonic number; ${H_{k}^{\left(2\right)}}={\textstyle\sum _{i=1}^{k}}\frac{1}{{i^{2}}}$ is the k-th partial sum of the 2nd order harmonic series.

Proof.

(i)

\[ {\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{1}{i}={\sum \limits_{i=1}^{k-1}}\frac{1}{i}(k-i)=k{\sum \limits_{i=1}^{k-1}}\frac{1}{i}-(k-1)+(\frac{1}{k}-\frac{1}{k})k=k{\sum \limits_{i=1}^{k}}\frac{1}{i}-k=k({H_{k}}-1).\]

(ii)

\[ {\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{1}{l}={\sum \limits_{l=2}^{k}}{\sum \limits_{i=1}^{l-1}}\frac{1}{l}={\sum \limits_{l=2}^{k}}\frac{l-1}{l}=k-1-\bigg({\sum \limits_{l=1}^{k}}\frac{1}{l}-1\bigg)=k-{\sum \limits_{l=1}^{k}}\frac{1}{l}=k-{H_{k}}.\]

(iii) For the calculation of the last equality, we can use the fact that for any finite collection of numbers ${\{{a_{il}}\}_{i,l=1}^{k}}$, such that ${a_{il}}={a_{li}}$ if $i\ne l$

\[ {\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=1+1}^{k}}{a_{il}}=\frac{1}{2}\bigg({\sum \limits_{i=1}^{k}}{\sum \limits_{l=1}^{k}}{a_{il}}-{\sum \limits_{i=1}^{k}}{a_{ii}}\bigg).\]

So,

\[\begin{aligned}{}{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{1}{i\cdot l}& =\frac{1}{2}\bigg({\sum \limits_{i=1}^{k}}{\sum \limits_{l=i+1}^{k}}\frac{1}{i\cdot l}-{\sum \limits_{i=1}^{k}}\frac{1}{{i^{2}}}\bigg)\\ {} & =\frac{1}{2}\bigg({\bigg({\sum \limits_{i=1}^{k}}\frac{1}{i}\bigg)^{2}}-{\sum \limits_{i=1}^{k}}\frac{1}{{i^{2}}}\bigg)=\frac{1}{2}\bigg({H_{k}^{2}}-{H_{k}^{\left(2\right)}}\bigg).\end{aligned}\]

□

Proof of Theorem 1.

For the expectation.

\[ \operatorname{E}(AP@k)=\frac{1}{\min (m,k)}{\sum \limits_{i=1}^{k}}\operatorname{E}(P@i\cdot {I_{i}}).\]

Let’s first calculate the following:

\[\begin{aligned}{}\operatorname{E}(P@i\cdot {I_{i}})& =\operatorname{E}\big(\operatorname{E}(P@i\cdot {I_{i}}|{I_{i}})\big)=\\ {} & =\operatorname{E}\big(P@i\cdot {I_{i}}|{I_{i}}=1\big)P\{{I_{i}}=1\}+\operatorname{E}\big(P@i\cdot {I_{i}}|{I_{i}}=0\big)P\{{I_{i}}=0\}=\\ {} & =\operatorname{E}\big(P@i|{I_{i}}=1\big)\cdot \frac{m}{N}=\frac{1}{i}\bigg((i-1)\frac{m-1}{N-1}+1\bigg)\cdot \frac{m}{N},\end{aligned}\]

since, if we know that ${I_{i}}=1$, then $P@i$ is distributed as $\frac{1}{i}\cdot \big(Hypergeometric(N-1,m-1,i-1)+1\big)$. Therefore,

\[\begin{aligned}{}\operatorname{E}(AP@k)& =\frac{1}{\min (m,k)}\cdot \frac{m}{N}{\sum \limits_{i=1}^{k}}\bigg((1-\frac{1}{i})\frac{m-1}{N-1}+\frac{1}{i}\bigg)=\\ {} & =\frac{1}{\min (m,k)}\cdot \frac{m}{N}\left(k\cdot \frac{m-1}{N-1}+(1-\frac{m-1}{N-1}){\sum \limits_{i=1}^{k}}\frac{1}{i}\right).\end{aligned}\]

For the variance.

Since $\operatorname{Var}(AP@k)=\operatorname{Var}\left(\frac{1}{k}{\textstyle\sum _{i=1}^{k}}P@i\cdot {I_{i}}\right)=\frac{1}{{k^{2}}}{\textstyle\sum _{i=1}^{k}}{\textstyle\sum _{l=1}^{k}}{c_{il}}$, we need to calculate all the coefficient ${c_{il}}$ taking into account that for every $i\ne l$ ${c_{il}}={c_{li}}=\operatorname{cov}(P@i\cdot {I_{i}},P@l\cdot {I_{l}})$ and for all i ${c_{ii}}=\operatorname{Var}(P@i\cdot {I_{i}})$.

Let us calculate them step by step, starting from the coefficients ${c_{ii}}$, $i=1,2,\dots ,k$. Since in this case the random variables ${I_{1}},\dots ,{I_{n}}$ are not independent, it is worth applying formulas that use conditional expectations. In particular, for two random variables X and Y, we can calculate the variance by the formula:

\[ \operatorname{Var}(X)=\operatorname{E}\big(\operatorname{Var}(X\mid Y)\big)+Var\big(\operatorname{E}(X\mid Y)\big).\]

So,

(6)

\[ {c_{ii}}=\operatorname{Var}\big(P@i\cdot {I_{i}}\big)=\operatorname{E}\big(\operatorname{Var}(P@i\cdot {I_{i}}\mid {I_{i}}\big)+\operatorname{Var}\big(\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}})\big)={t_{1}}+{t_{2}}.\]

If ${I_{i}}=0$, then $P@i\cdot {I_{i}}=0$, and $\operatorname{E}\big(P@i\cdot {I_{i}}\mid {I_{i}}=0\big)=0$, $\operatorname{Var}\big(P@i\cdot {I_{i}}\mid {I_{i}}=0\big)=0$. If ${I_{i}}=1$, then $\operatorname{E}\big(P@i\cdot {I_{i}}\mid {I_{i}}=1\big)=\frac{1}{i}\left((i-1)\frac{m-1}{N-1}+1\right)$ and

\[ \operatorname{Var}\big(P@i\cdot {I_{i}}\mid {I_{i}}=1\big)=\operatorname{E}\left({\big(P@i\cdot {I_{i}}\big)^{2}}\mid {I_{i}}=1\right)-{\big(\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}}=1)\big)^{2}}.\]

First of all

\[\begin{aligned}{}\operatorname{E}\big({(P@i\cdot {I_{i}})^{2}}\mid {I_{i}}=1\big)& =\operatorname{E}\left({\left(\frac{1}{i}\left({\sum \limits_{j=1}^{i-1}}{I_{j}}{I_{i}}+{I_{i}^{2}}\right)\right)^{2}}\mid {I_{i}}=1\right)=\\ {} =& \frac{1}{{i^{2}}}\operatorname{E}\left(1+2{\sum \limits_{j=1}^{i-1}}{I_{j}}+{\left({\sum \limits_{j=1}^{i-1}}{I_{j}}\right)^{2}}\mid {I_{i}}=1\right)=\\ {} =& \frac{1}{{i^{2}}}\operatorname{E}\left(1+2{\sum \limits_{j=1}^{i-1}}{I_{j}}+{\sum \limits_{j=1}^{i-1}}{I_{j}^{2}}+\sum \sum \limits_{j\ne r}{I_{j}}{I_{r}}\mid {I_{i}}=1\right)=\\ {} =& \frac{1}{{i^{2}}}\left(1+3{\sum \limits_{j=1}^{i-1}}\operatorname{E}\big({I_{j}}\mid {I_{i}}=1\big)+\sum \sum \limits_{j\ne r}\operatorname{E}\big({I_{j}}{I_{r}}\mid {I_{i}}=1\big)\right)=\\ {} =& \frac{1}{{i^{2}}}\left(1+3(i-1)\cdot \frac{m-1}{N-1}+(i-1)(i-2)\frac{(m-1)(m-2)}{(N-1)(N-2)}\right),\end{aligned}\]

as $\operatorname{E}\big({I_{j}}\mid {I_{i}}=1\big)=\frac{{C_{N-2}^{m-2}}}{{C_{N-1}^{m-1}}}=\frac{m-1}{N-1}$ and $\operatorname{E}({I_{j}}{I_{r}}\mid {I_{i}}=1)=\frac{{C_{N-3}^{m-3}}}{{C_{N-1}^{m-1}}}=\frac{(m-1)(m-2)}{(N-1)(N-2)}$.

Then

\[\begin{aligned}{}\operatorname{Var}(P@i\cdot {I_{i}}\mid {I_{i}}=1)& =\frac{1}{{i^{2}}}\left(1+3(i-1)\cdot \frac{m-1}{N-1}+(i-1)(i-2)\frac{(m-1)(m-2)}{(N-1)(N-2)}\right)\\ {} & -\frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}\end{aligned}\]

and thus we obtain the first term in (6):

\[\begin{aligned}{}{t_{1}}& =\operatorname{E}\big(\operatorname{Var}(P@i\cdot {I_{i}}\mid {I_{i}})\big)=\operatorname{Var}\big(P@i\cdot {I_{i}}\mid {I_{i}}=1\big)\cdot \frac{m}{N}=\\ {} & =\frac{m}{N}\cdot \frac{1}{{i^{2}}}\left(1+3(i-1)\cdot \frac{m-1}{N-1}+(i-1)(i-2)\frac{(m-1)(m-2)}{(N-1)(N-2)}\right.\\ {} & \left.-{((i-1)\frac{m-1}{N-1}+1)^{2}}\right).\end{aligned}\]

Secondly, if ${I_{i}}=1$ the random variable $\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}})$ takes the value $\frac{1}{i}\cdot \big((i-1)\frac{m-1}{N-1}+1\big)$ with probability $\frac{m}{N}$ and 0 otherwise. Therefore,

\[\begin{aligned}{}{t_{2}}& =\operatorname{Var}\big(\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}})\big)=\operatorname{E}\big({(\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}}))^{2}}\big)-{\big(\operatorname{E}(\operatorname{E}(P@i\cdot {I_{i}}\mid {I_{i}}))\big)^{2}}=\\ {} & =\frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}\cdot \frac{m}{N}-\frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}\cdot {\left(\frac{m}{N}\right)^{2}}=\\ {} & =\frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}\cdot \frac{m}{N}\left(1-\frac{m}{N}\right).\end{aligned}\]

So, finally, we have the following expression after collecting the coefficients near $\frac{1}{{i^{r}}},r=0,1,2$.

\[\begin{aligned}{}{c_{ii}}=& \operatorname{Var}\big(P@i\cdot {I_{i}}\big)={t_{1}}+{t_{2}}=\\ {} =& \frac{m}{N}\cdot \frac{1}{{i^{2}}}\left(1+3(i-1)\cdot \frac{m-1}{N-1}+(i-1)(i-2)\frac{(m-1)(m-2)}{(N-1)(N-2)}\right)-\\ {} & -\frac{m}{N}\cdot \frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}+\frac{1}{{i^{2}}}{\left((i-1)\frac{m-1}{N-1}+1\right)^{2}}\cdot \frac{m}{N}\left(1-\frac{m}{N}\right)=\\ {} =& \frac{m}{N}\left(1-\frac{m}{N}-\frac{m-1}{N-1}\left(3-2\frac{m-2}{N-2}-\frac{m}{N}\left(2-\frac{m-1}{N-1}\right)\right)\right)\frac{1}{{i^{2}}}+\\ {} & +\frac{m}{N}\frac{(m-1)}{(N-1)}\left(3\left(1-\frac{m-2}{N-2}\right)-2\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right)\frac{1}{i}+\\ {} & +\frac{m}{N}\frac{(m-1)}{(N-1)}\frac{(m-2)}{(N-2)}-{\left(\frac{m(m-1)}{N(N-1)}\right)^{2}}.\end{aligned}\]

(ii) For all such i and l that $1\lt i\lt l\le k$

\[ {c_{il}}={c_{li}}=\operatorname{cov}(P@i\cdot {I_{i}},P@l\cdot {I_{l}})=\operatorname{E}(P@i\cdot {I_{i}}\cdot P@l\cdot {I_{l}})-\operatorname{E}(P@i\cdot {I_{i}})\cdot \operatorname{E}(P@l\cdot {I_{l}}).\]

Here, we can also use the conditional expectation for calculation

\[\begin{aligned}{}\operatorname{E}(P@i{I_{i}}\cdot P@l{I_{l}})& =\operatorname{E}\big(\operatorname{E}(P@i{I_{i}}\cdot P@l{I_{l}}\mid {I_{i}}{I_{l}})\big)\\ {} & =\operatorname{E}(P@i{I_{i}}\cdot P@l{I_{l}}\mid {I_{i}}{I_{l}}=1)P({I_{i}}{I_{l}}=1).\end{aligned}\]

If ${I_{i}}{I_{l}}=1$, then we know that the items in the ith and lth places are relevant, so we should take into account the possible arrangements of the other $m-2$ relevant items between the $N-2$ places. So, $P({I_{i}}{I_{l}}=1)=\frac{{C_{N-2}^{m-2}}}{{C_{N}^{m}}}=\frac{m(m-1)}{N(N-1)}$. And then

\[\begin{aligned}{}\operatorname{E}(P@i& \cdot {I_{i}}\cdot P@l\cdot {I_{l}}\mid {I_{i}}{I_{l}}=1)=\operatorname{E}\left(\frac{1}{i}{\sum \limits_{j=1}^{i}}{I_{j}}\cdot \frac{1}{l}{\sum \limits_{r=1}^{l}}{I_{r}}\mid {I_{i}}{I_{l}}=1\right)\\ {} =& \frac{1}{i\cdot l}\operatorname{E}\left(\left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)\left(1+{\sum \limits_{r=1}^{i-1}}{I_{r}}+{\sum \limits_{r=i+1}^{l-1}}{I_{r}}+1\right)\mid {I_{i}}{I_{l}}=1\right)\\ {} =& \frac{1}{i\cdot l}\operatorname{E}\left(2+4{\sum \limits_{r=1}^{i-1}}{I_{r}}+{\sum \limits_{r=i+1}^{l-1}}{I_{r}}+2{\sum \limits_{j=1}^{i-r}}{\sum \limits_{r=j+1}^{i-1}}{I_{j}}{I_{r}}+{\sum \limits_{j=1}^{i-1}}{\sum \limits_{r=i+1}^{l-1}}{I_{j}}{I_{r}}\mid {I_{i}}{I_{l}}=1\right)\\ {} =& \frac{1}{i\cdot l}\left(2+4(i-1)\frac{m-2}{N-2}+(l-1-i)\frac{m-2}{N-2}+(i-1)(i-2)\frac{(m-2)(m-3)}{(N-2)(N-3)}\right.\\ {} & \left.+(i-1)(l-1-i)\frac{(m-2)(m-3)}{(N-2)(N-3)}\right)=\\ {} =& \frac{1}{i\cdot l}\left(2+\frac{m-2}{N-2}(3i-5+l)+\frac{(m-2)(m-3)}{(N-2)(N-3)}(i-1)(l-3)\right).\end{aligned}\]

So, after putting all the components together and doing some algebra, we obtain

\[\begin{aligned}{}{c_{il}}& =\frac{m(m-1)}{N(N-1)}\frac{1}{i\cdot l}\left[2-5\frac{m-2}{N-2}+i\cdot 3\frac{m-2}{N-2}+l\frac{m-2}{N-2}+i\cdot l\frac{(m-2)(m-3)}{(N-2)(N-3)}-\right.\\ {} & \left.-i\cdot 3\frac{(m-2)(m-3)}{(N-2)(N-3)}-l\frac{(m-2)(m-3)}{(N-2)(N-3)}+3\frac{(m-2)(m-3)}{(N-2)(N-3)}\right]-\\ {} & -{\left(\frac{m}{N}\right)^{2}}\frac{1}{i\cdot l}\left(1+(i-1)\frac{m-1}{N-1}\right)\left(1+(l-1)\frac{m-1}{N-1}\right)=\\ {} =& \frac{m}{N}\frac{1}{i\cdot l}\left[\frac{m-1}{N-1}\left(2-5\frac{m-2}{N-2}+3\frac{(m-2)(m-3)}{(N-2)(N-3)}\right)-\frac{m}{N}{\left(1-\frac{m-1}{N-1}\right)^{2}}+\right.\\ {} & +i\frac{m-1}{N-1}\left(3\frac{m-2}{N-2}\left(1-\frac{m-3}{N-3}\right)-\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right)+\\ {} & +l\frac{m-1}{N-1}\left(\frac{m-2}{N-2}\left(1-\frac{m-3}{N-3}\right)-\frac{m}{N}\left(1-\frac{m-1}{N-1}\right)\right)+\\ {} & +\left.i\cdot l\frac{m-1}{N-1}\left(\frac{(m-2)(m-3)}{(N-2)(N-3)}-\frac{m}{N}\frac{m-1}{N-1}\right)\right].\end{aligned}\]

Using the notations from the Theorem 1 statement and $M=min(m,k)$ for shortness, along with results from Lemma 1, we’ll get:

\[\begin{aligned}{}\operatorname{Var}& {_{SWOR}}(AP@k)=\frac{1}{{M^{2}}}\Bigg({\sum \limits_{i=1}^{k}}{c_{ii}}+2{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}{c_{il}}\Bigg)=\\ {} =& \frac{1}{{M^{2}}}\Bigg[{\sum \limits_{i=1}^{k}}\frac{m}{N}\Bigg(\frac{A}{{i^{2}}}+\frac{B}{i}+C\Bigg)+2{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{m}{N}\Bigg(\frac{D}{i\cdot l}+\frac{E}{l}+\frac{F}{i}+G\Bigg)\Bigg]=\\ {} =& \frac{1}{{M^{2}}}\frac{m}{N}\Bigg[A\cdot {H_{k}^{\left(2\right)}}+B\cdot {H_{k}}+kC+D({H_{k}^{2}}-{H_{k}^{\left(2\right)}})+\\ {} & +2Ek({H_{k}}-1)+2F(k-{H_{k}})+Gk(k-1)\Bigg]=\\ {} =& \frac{1}{{M^{2}}}\frac{m}{N}\Bigg[k[C+2(E-F)+(k-1)G]+{H_{k}}(B-2(E-kF))+\\ {} & +{H_{k}^{2}}\cdot D+{H_{k}^{\left(2\right)}}(A-D)\Bigg].\end{aligned}\]

This completes the proof. □

Proof of Theorem 2.

For the expectation.

If we examine the model of relevance of proposals that corresponds to sampling with replacement with a fixed probability of success p, then we can use $\frac{1}{k}$ for normalization purposes, since in this case it is possible that all k items can be relevant. So, by the definition of $AP@k$, we have

\[\begin{aligned}{}\operatorname{E}(AP@k)& =\operatorname{E}\bigg(\frac{1}{k}{\sum \limits_{i=1}^{k}}P@i\cdot {I_{i}}\bigg)=\frac{1}{k}{\sum \limits_{i=1}^{k}}\operatorname{E}\bigg(\frac{1}{i}{\sum \limits_{j=1}^{i}}{I_{j}}\cdot {I_{i}}\bigg)\\ {} & =\frac{1}{k}{\sum \limits_{i=1}^{k}}\bigg(\frac{1}{i}{\sum \limits_{j=1}^{i}}\operatorname{E}({I_{j}}\cdot {I_{i}})\bigg).\end{aligned}\]

For all $j\ne i$, $\operatorname{E}({I_{j}}\cdot {I_{i}})=\operatorname{E}({I_{j}})\cdot \operatorname{E}({I_{i}})={p^{2}}$, as these random variables are independent. If $j=i$, then $\operatorname{E}({I_{i}^{2}})=\operatorname{E}({I_{i}})=P\{{I_{i}}=1\}=p$. So,

\[\begin{aligned}{}\operatorname{E}(AP@k)& =\frac{1}{k}{\sum \limits_{i=1}^{k}}\frac{1}{i}\bigg({\sum \limits_{j=1}^{i-1}}\operatorname{E}({I_{j}}\cdot {I_{i}})+\operatorname{E}({I_{i}})\bigg)=\frac{1}{k}{\sum \limits_{i=1}^{k}}\frac{1}{i}((i-1){p^{2}}+p)=\\ {} & =\frac{p}{k}{\sum \limits_{i=1}^{k}}(p+(1-p)\frac{1}{i})={p^{2}}+p(1-p)\frac{1}{k}{\sum \limits_{i=1}^{k}}\frac{1}{i}.\end{aligned}\]

For the variance. As before,

\[ \operatorname{Var}(AP@k)=\operatorname{Var}\left(\frac{1}{k}{\sum \limits_{i=1}^{k}}P@i\cdot {I_{i}}\right)=\frac{1}{{k^{2}}}{\sum \limits_{i=1}^{k}}{\sum \limits_{l=1}^{k}}{c_{il}},\]

and we need to calculate the coefficient ${c_{il}}$ taking into account that for every $i\ne l$ ${c_{il}}={c_{li}}=\operatorname{cov}(P@i\cdot {I_{i}},P@l\cdot {I_{l}})$ and for all i ${c_{ii}}=\operatorname{Var}(P@i\cdot {I_{i}})$.

Let us calculate them again one by one.

(i) For $i=1,2,\dots ,k$

\[\begin{aligned}{}{c_{ii}}& =\operatorname{Var}\left(\frac{1}{i}{\sum \limits_{j=1}^{i}}{I_{j}}{I_{i}}\right)=\frac{1}{{i^{2}}}\operatorname{Var}\left({\sum \limits_{j=1}^{i-1}}{I_{j}}{I_{i}}+{I_{i}^{2}}\right)=\\ {} & =\frac{1}{{i^{2}}}\operatorname{Var}\left({I_{i}}\left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)\right).\end{aligned}\]

Here, the random variables ${I_{i}}$ and $1+{\textstyle\sum _{j=1}^{i-1}}{I_{j}}$ are independent. The variance of the product of two independent random variables X and Y can be calculated using the following rule:

\[ \operatorname{Var}(X\cdot Y)=\operatorname{E}({X^{2}})\operatorname{E}\big({Y^{2}}\big)-{\big(\operatorname{E}{(X))^{2}}(\operatorname{E}(Y)\big)^{2}}.\]

As far as $\operatorname{E}\big({I_{i}^{2}}\big)=\operatorname{E}({I_{i}})=p$,

$\operatorname{E}\left(1+{\textstyle\sum _{j=1}^{i-1}}{I_{j}}\right)=1+{\textstyle\sum _{j=1}^{i-1}}\operatorname{E}\big({I_{j}}\big)=1+(i-1)p$, and

\[\begin{aligned}{}\operatorname{E}& \left({\left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)^{2}}\right)=\operatorname{E}\left(1+2{\sum \limits_{j=1}^{i-1}}{I_{j}}+{\left({\sum \limits_{j=1}^{i-1}}{I_{j}}\right)^{2}}\right)=\\ {} & =\operatorname{E}\left(1+2{\sum \limits_{j=1}^{i-1}}{I_{j}}+{\sum \limits_{j=1}^{i-1}}{I_{j}^{2}}+2{\sum \limits_{j=1}^{i-2}}{\sum \limits_{r=j+1}^{i-1}}{I_{j}}{I_{r}}\right)=\\ {} & =1+3(i-1)p+(i-1)(i-2){p^{2}},\end{aligned}\]

then

\[\begin{aligned}{}{c_{ii}}& =\frac{1}{{i^{2}}}\operatorname{Var}\left({I_{i}}\cdot \left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)\right)=\\ {} & =\frac{1}{{i^{2}}}\left(\operatorname{E}\left({I_{i}^{2}}\right)\operatorname{E}\left({\left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)^{2}}\right)-{\left(\operatorname{E}({I_{i}})\right)^{2}}{\left(\operatorname{E}\left(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\right)\right)^{2}}\right)=\\ {} & =\frac{1}{{i^{2}}}\left(p\left(1+3(i-1)p+(i-1)(i-2){p^{2}}\right)-{p^{2}}{\left(1+(i-1)p\right)^{2}}\right)=\\ {} & =p(1-p)\left({p^{2}}+p(3-2p)\frac{1}{i}+(1-3p+{p^{2}})\frac{1}{{i^{2}}}\right).\end{aligned}\]

(ii) For all i and l for which $1\lt i\lt l\le k$

\[ {c_{il}}={c_{li}}=\operatorname{cov}({I_{i}}\cdot P@i,{I_{l}}\cdot P@l)=\operatorname{E}({I_{i}}\cdot P@i\cdot {I_{l}}\cdot P@l)-\operatorname{E}({I_{i}}\cdot P@i)\cdot \operatorname{E}({I_{l}}\cdot P@l).\]

Here

\[\begin{aligned}{}\operatorname{E}& ({I_{i}}\cdot P@i\cdot {I_{l}}\cdot P@l)=\frac{1}{i\cdot l}\operatorname{E}\Big({I_{i}}\Big(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\Big)\cdot {I_{l}}\Big(1+{\sum \limits_{r=1}^{l-1}}{I_{r}}\Big)\Big)\\ {} & =\frac{1}{i\cdot l}\operatorname{E}\Big({I_{i}}{I_{l}}\Big(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\Big)\Big(1+{\sum \limits_{r=1}^{i-1}}{I_{r}}+{I_{i}}+{\sum \limits_{r=i+1}^{l-1}}{I_{r}}\Big)\Big)\\ {} & =\frac{1}{i\cdot l}\operatorname{E}\Big({I_{i}}{I_{l}}{\Big(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\Big)^{2}}+{I_{i}^{2}}{I_{l}}\Big(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\Big)+{I_{i}}{I_{l}}\Big(1+{\sum \limits_{j=1}^{i-1}}{I_{j}}\Big)\Big({\sum \limits_{r=i+1}^{l-1}}{I_{r}}\Big)\Big)\\ {} & =\frac{{p^{2}}}{i\cdot l}\Big(2+(3i-5+l)p+(i-1)(l-3){p^{2}}\Big).\end{aligned}\]

Therefore,

\[\begin{aligned}{}{c_{il}}& =\frac{{p^{2}}}{i\cdot l}\Big(2+(3i-5+l)p+(i-1)(l-3){p^{2}}-\Big(1+(i-1)p\Big)\Big(1+(l-1)p\Big)\Big)=\\ {} & =\frac{1}{l}{p^{2}}(1-p)\Big((1-2p)\frac{1}{i}+2p\Big).\end{aligned}\]

Summing up all the coefficients ${\{{c_{il}}\}_{i,l=1}^{k}}$ we obtain

\[\begin{aligned}{}{\sum \limits_{i=1}^{k}}{\sum \limits_{l=1}^{k}}{c_{il}}=& {\sum \limits_{i=1}^{k}}{c_{ii}}+2{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}{c_{il}}=\\ {} =& {\sum \limits_{i=1}^{k}}\Bigg(p(1-p)\Big({p^{2}}+p(3-2p)\frac{1}{i}+(1-3p+{p^{2}})\frac{1}{{i^{2}}}\Big)\Bigg)+\\ {} & +2{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}{p^{2}}(1-p)\Bigg((1-2p)\frac{1}{i\cdot l}+2p\frac{1}{l}\Bigg)=\\ {} =& k{p^{3}}(1-p)+{p^{2}}(3-2p){\sum \limits_{i=1}^{k}}\frac{1}{i}+p(1-3p+{p^{2}}){\sum \limits_{i=1}^{k}}\frac{1}{{i^{2}}}+\\ {} & +2{p^{2}}(1-p)(1-2p){\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{1}{i\cdot l}+4{p^{3}}(1-p){\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\frac{1}{l}.\end{aligned}\]

Using the results from Lemma 1 we, finally, get

\[\begin{aligned}{}\operatorname{Var}& (AP@k)=\frac{1}{{k^{2}}}\Bigg({\sum \limits_{i=1}^{k}}{c_{ii}}+2{\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}{c_{il}}\Bigg)=\\ {} =& \frac{1}{{k^{2}}}\Bigg(p(1-p){\sum \limits_{i=1}^{k}}\Big({p^{2}}+p(3-2p)\frac{1}{i}+(1-3p+{p^{2}})\frac{1}{{i^{2}}}\Big)+\\ {} & +2{p^{2}}(1-p){\sum \limits_{i=1}^{k-1}}{\sum \limits_{l=i+1}^{k}}\Big((1-2p)\frac{1}{i\cdot l}+2p\frac{1}{l}\Big)\Bigg)=\\ {} =& \frac{5}{k}{p^{3}}(1-p)+\frac{1}{{k^{2}}}p(1-p)\left(p(1-2p)\big(3{H_{k}}+{H_{k}^{2}}\big)+(1-p)(1-3p){H_{k}^{(2)}}\right).\end{aligned}\]

□

Acknowledgments

The authors thank anonymous referees for their careful reading of the first version of the manuscript and valuable remarks that helped to significantly improve the paper.

References

[1]

Agarwal, D.K., Chen, B.-C.: Evaluation Methods, pp. 55–78. Cambridge University Press (2016)

[2]

Bestgen, Y.: Exact expected average precision of the random baseline for system evaluation. Prague Bull. Math. Linguist. 103, 131–138 (2015). https://doi.org/10.1515/pralin-2015-0007

[3]

Boas, R.P., Wrench, J.W.: Partial sums of the harmonic series. Am. Math. Mon. 78(8), 864–870 (1971). MR0289994. https://doi.org/10.1080/00029890.1971.11992881

[4]

Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International Conference on World Wide Web, pp. 107–117. Elsevier Science Publishers B. V. (1998). https://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf

[5]

Engström, C., Silvestrov, S.: Pagerank for networks, graphs and Markov chains. Theory Probab. Math. Stat. 96, 61–83 (2017) MR3666872. https://doi.org/10.1090/tpms/1034

[6]

Gebremeskel, G.G., de Vries, A.P.: Recommender systems evaluations: offline, online, time and a/a test. In: Balog K., F.N.M.C. Cappellato L. (ed.) Working Notes of CLEF 2016 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings vol. 1609, pp. 642–656 (2016). https://ceur-ws.org/Vol-1609/16090642.pdf

[7]

Jadon, A., Patil, A.: A Comprehensive Survey of Evaluation Techniques for Recommendation Systems (2024). arXiv:2312.16015

[8]

Knuth, D.E.: Harmonic Numbers, 1st edn., pp. 73–78. Addison-Wesley (1968)

[9]

Valcarce, D., Bellogín, A., Parapar, J., Castells, P.: Assessing ranking metrics in top-n recommendation. Inf. Retr. J. 23(4), 411–448 (2020). https://doi.org/10.1007/s10791-020-09377-x

Exit Reading

Table of contents

1 Introduction
2 Offline and online evaluation
3 Average Precision at k and Mean Average Precision at k
4 Expectation and variance of AP@k under sampling without replacement for the case of offline ranking evaluation
5 Expectation and variance of AP@k under sampling with replacement in case of the online ranking evaluation
6 Numerical examples and simulations results
7 Conclusions and discussion
A Appendix. Proofs of Theorem 1 and 2
Acknowledgments
References

RSS

Authors