Abstract Summary
Off-policy evaluation (OPE) and optimization for learning to rank (LTR) leverage document placement probabilities to correct for the effects of various statistical biases, e.g., position bias. However, computing these propensities poses a challenge, as for most ranking models this requires iterating over all possible rankings. A common solution is to approximate them by sampling multiple rankings and using the ob- served document frequencies per position. Nevertheless, even when using extremely large numbers of sampled rankings, these estimates often still contain significant estimation errors. In this work, we propose the novel marginalized Plackett-Luce (MPL) method to efficiently and accurately calculate document-rank placement probabilities under the widely used Plackett-Luce (PL) ranking model. In particular, we establish MPL by first showing that this probability is the expected value of a Poisson binomial distribution over the document scores; subsequently, we leverage a known connection between the Poisson binomial distribution, convolutional operations and numerical integration, to achieve efficient and accurate propensity estimation. Furthermore, we argue that MPL provides near-exact estimation when computing the function over a practical number of evaluation points. Our experiments confirm that the propensity estimation of MPL is highly accurate, efficient, and leads to substantial improvements over the sampling-based method in downstream applications, thus opening the door to a wider use of PL policies in off-policy learning to rank.