Average precision at cutoff k under random rankings: expectation and variance
Pub. online: 2 April 2026
Type: Research Article
Open Access
Received
2 November 2025
2 November 2025
Accepted
28 February 2026
28 February 2026
Published
2 April 2026
2 April 2026
Notes
This paper is dedicated to the 85th anniversary of the birth of Prof. Yuriy Kozachenko.
Abstract
Recommender systems and information retrieval platforms rely on ranking algorithms to present the most relevant items to users, thereby improving engagement and satisfaction. Assessing the quality of these rankings requires reliable evaluation metrics. Among them, Mean Average Precision at cutoff k (MAP@k) is widely used, as it accounts for both the relevance of items and their positions in the list for some groups of users.
It seems obvious that intelligent ranking algorithms should outperform recommendations generated at random. But how can we measure how much better they work? In this article, we have established the expected value and variance of the average accuracy at k (AP@k), as they can be used as a foundation for efficiency criteria for MAP@k. Here, we considered two widely used evaluation models: offline and online, together with corresponding randomization models for them, and calculated the expected value and variance of AP@k in both cases. The numerical study for different scenarios was also performed.
References
Bestgen, Y.: Exact expected average precision of the random baseline for system evaluation. Prague Bull. Math. Linguist. 103, 131–138 (2015). https://doi.org/10.1515/pralin-2015-0007
Boas, R.P., Wrench, J.W.: Partial sums of the harmonic series. Am. Math. Mon. 78(8), 864–870 (1971). MR0289994. https://doi.org/10.1080/00029890.1971.11992881
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International Conference on World Wide Web, pp. 107–117. Elsevier Science Publishers B. V. (1998). https://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf
Engström, C., Silvestrov, S.: Pagerank for networks, graphs and Markov chains. Theory Probab. Math. Stat. 96, 61–83 (2017) MR3666872. https://doi.org/10.1090/tpms/1034
Gebremeskel, G.G., de Vries, A.P.: Recommender systems evaluations: offline, online, time and a/a test. In: Balog K., F.N.M.C. Cappellato L. (ed.) Working Notes of CLEF 2016 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings vol. 1609, pp. 642–656 (2016). https://ceur-ws.org/Vol-1609/16090642.pdf
Jadon, A., Patil, A.: A Comprehensive Survey of Evaluation Techniques for Recommendation Systems (2024). arXiv:2312.16015
Valcarce, D., Bellogín, A., Parapar, J., Castells, P.: Assessing ranking metrics in top-n recommendation. Inf. Retr. J. 23(4), 411–448 (2020). https://doi.org/10.1007/s10791-020-09377-x