Codebase for using user query variations (UQVs) to improve efficiency and effectiveness of online rank fusion.
These algorithms were presented in the following paper. Please cite it if you extend our work:
Rodger Benham, Joel Mackenzie, Alistair Moffat, and J. Shane Culpepper: Boosting Search Performance Using Query Variations. In ACM Transactions on Information Systems (TOIS), October, 2019.
@article{bmmc19-tois,
author = {R. Benham and J. Mackenzie and A. Moffat and J. S. Culpepper},
title = {Boosting Search Performance Using Query Variations},
journal = "ACM Trans. on Information Systems",
year = {2019},
volume = {37},
number = {4},
pages = {41.1--41.25},
}
Assume, for the sake of this example, that we have the three following UQVs for a single topic, with a topic ID of 1:
- raspberry pi
- raspberry pi price
- raspberry pi cost
The input query format will then be:
1:raspberry pi
1:raspberry pi price
1:raspberry pi cost
For our experiments, we use the UQV100 collection. The resources can be found here.
The original codebase was based on the variable block-max wand code, which can be found here. However, as this codebase is not maintained, we have ported our implementations to the successor library, known as PISA.
The modifications to PISA are hosted on a fork which is included here as a submodule.
The idea behind SP-CS processing is to treat each multi-query as one large weighted query, where the weight of each term corresponds to how many query variations the term was present in. Take our example query above - the SP-CS query can be thought of as 3 raspberry 3 pi 1 price 1 cost, where each term is preceded by its weight.
- Only works with additive scoring regimes, and will output CombSUM fusion results by default. Can be extended to any other additive fusion method with little effort.
Binaries: evaluate_single_pass_combsum
for TREC outputs,
single_pass_combsum
for efficiency experiments.
Parallel fusion simply spins off a thread for every query, and fuse the resultant heaps via CombSUM (or any other fusion method -- we implement CombSUM here).
Binaries: evaluate_parallel_combsum
for TREC outputs, parallel_combsum
for
efficiency experiments.