Publications
30 publications total
2026
Reproduce Accepted SIGIR-2026
Abstract Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has shown that simple prompt injections embedded within a candidate document (jailbreak prompt attacks) can significantly alte...
Short Accepted SIGIR-2026
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
Abstract This paper introduces Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task. Existing document reranking methods based...
Reproduce Accepted SIGIR-2026
Beyond Chunk-Then-Embed: A Comprehensive Taxonomy and Evaluation of Document Segmentation Strategies for Information Retrieval
Abstract Document chunking is a critical preprocessing step in dense retrieval systems, yet the design space of chunking strategies remains poorly understood. Recent research has proposed several concurrent approaches...
Long Accepted EACL-2026
🤗 Live Demo AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation
Abstract We present AutoBool, a reinforcement learning (RL) framework that trains large language models (LLMs) to generate effective Boolean queries for medical systematic reviews. Boolean queries are the primary mech...
Short Accepted ECIR-2026
Evalugator🐊—Rapid, Agile Development and Evaluation of Retrieval Augmented Generation Systems Without Labels
Abstract Evaluating complex Retrieval Augmented Generation (RAG) systems in real-world settings is challenging. There is often a lack of finegrained labelled data and the absence of comprehensive evaluation tools tha...
2025
Reproduce Accepted SIGIR-2025
Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition
Abstract Dense retrievers utilize pre-trained backbone language models (e.g., BERT, LLaMA) that are fine-tuned via contrastive learning to perform the task of encoding text into sense representations that can be then ...
Reproduce Accepted SIGIR-2025
Reassessing Large Language Model Boolean Query Generation for Systematic Reviews
Abstract Systematic reviews are comprehensive literature reviews that address highly focused research questions and represent the highest form of evidence in medicine. A critical step in this process is the developmen...
Reproduce Accepted SIGIR-2025
2D Matryoshka Training for Information Retrieval
Abstract 2D Matryoshka Training is an advanced embedding representation training approach designed to train an encoder model simultaneously across various layer-dimension setups. This method has demonstrated higher ef...
Long ECIR-2025
Corpus Subsampling: Estimating the Effectiveness of Neural Retrieval Models on Large Corpora
Abstract Due to their low efficiency, neural retrieval models are usually evaluated on small corpora (e.g. MS MARCO or BEIR subsets) or in re-ranking scenarios using a more efficient first-stage retriever. To estimate...
2024
Long ECIR2026
Starbucks: Improved Training for 2D Matryoshka Embeddings
Abstract Effective approaches that can scale embedding model depth (i.e. layers) and embedding size allow for the creation of models that are highly scalable across different computational resources and task requireme...
Long WSDM2025
Context Embeddings for Efficient Answer Generation in RAG
Abstract Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer w...
Resource EMNLP2024
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
Abstract Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve a...
Long ECIR-2025
Zero-shot Generative Large Language Models for Systematic Review Screening Automation
Abstract We provide a systematic understanding of the impact of specific components and wordings used in prompts on the effectiveness of rankers based on zero-shot Large Language Models (LLMs). Several zero-shot ranki...
Short SIGIR-2024
Large Language Models for Stemming: Promises, Pitfalls and Failures
Abstract Text stemming is a natural language processing technique that is used to reduce words to their base form, also known as the root form. The use of stemming in IR has been shown to often improve the effectivene...
Long SIGIR-2024
Evaluating Generative Ad Hoc Information Retrieval
Abstract Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an info...
Resource SIGIR-2024
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
Abstract Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent. With the increasing uptake of Retrieval-Augmented G...
Short Arxiv
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
Abstract Federated search, which involves integrating results from multiple independent search engines, will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines empowering LLM-based ...
2023
Long ECIR-2024
Zero-shot Generative Large Language Models for Systematic Review Screening Automation
Abstract Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, esp...
Long SIGIR-AP-2023
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation
Abstract Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. The goal is to prioritise the most important documents so that subsequent review ...
Long Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)
Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?
Abstract Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the k...
Reproduce Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)
Balanced Topic Aware Sampling for Effective Dense Retriever: A Reproducibility Study
Abstract Knowledge distillation plays a key role in boosting the effectiveness of rankers based on pre-trained language models (PLMs); this is achieved using an effective but inefficient large model to teach a more ef...
Short The 16th ACM International Web Search and Data Mining Conference
🤗 Live Demo MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction
Abstract Boolean query construction is often critical for medical systematic review literature search. To create an effective Boolean query, sys- tematic review researchers typically spend weeks coming up with effecti...
2022
Long Australasian Document Computing Symposium (ADCS 2022, to appear)
Neural Rankers for Effective Screening Prioritization in Medical Systematic Review Literature Search
Abstract Medical systematic reviews typically require that all the documents retrieved by a search are assessed. The reason is two-fold: the task aims for “total recall”; and documents retrieved using Boolean search a...
Journal Intelligent Systems with Applications (ISWA) Technology-Assisted Review Systems Special Issue
Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search
Abstract Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research q...
Short Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)
To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
Abstract Current pre-trained language model approaches to information retrieval can be broadly divided into two categories: sparse retrievers (to which belong also non-neural approaches such as bag-of-words methods, e...
Resource Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)
From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search
Abstract Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research q...
Reproduce Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022)
SDR for Systematic Reviews: A Reproducibility Study
Abstract Screening or assessing studies is critical to the quality and outcomes of a systematic review. Typically, a Boolean query retrieves the set of studies to screen. As the set of studies retrieved is unordered, ...
2021
Long Australasian Document Computing Symposium (ADCS 2021)
MeSH Term Suggestion for Systematic Review Literature Search
Abstract High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a k...
Notebook TREC 2021 Deep Learning Track
IELAB at TREC Deep Learning Track 2021
Abstract
Short The Proceedings of the 2021 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR 2021)
BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval
Abstract The integration of deep, pre-trained language models, such as BERT, into retrieval and ranking pipelines has shown to provide large effectiveness gains over traditional bag-of-words models in the passage retr...
