Posts by Collection

awards

Best Student Paper Award

Best student paper award on ADCS 2021 for paper MeSH Term Suggestion for Systematic Review Literature Search

publications

Long Australasian Document Computing Symposium (ADCS 2021)

MeSH Term Suggestion for Systematic Review Literature Search

Shuai Wang and Hang Li and Harry Scells and Daniel Locke and Guido Zuccon.

Abstract High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a k...
Reproduce Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022)

SDR for Systematic Reviews: A Reproducibility Study

Shuai Wang and Harry Scells and Ahmed Mourad and Guido Zuccon.

Abstract Screening or assessing studies is critical to the quality and outcomes of a systematic review. Typically, a Boolean query retrieves the set of studies to screen. As the set of studies retrieved is unordered, ...
Resource Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search

Shuai Wang and Harry Scells and Justin Clark and Guido Zuccon and Bevan Koopman.

Abstract Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research q...
Short Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Hang Li* and Shuai Wang* and Shengyao Zhuang and Ahmed Mourad and xueguang-ma and jimmy-lin and Guido Zuccon.

Abstract Current pre-trained language model approaches to information retrieval can be broadly divided into two categories: sparse retrievers (to which belong also non-neural approaches such as bag-of-words methods, e...
Reproduce Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Balanced Topic Aware Sampling for Effective Dense Retriever: A Reproducibility Study

Shuai Wang, and Guido Zuccon.

Abstract Knowledge distillation plays a key role in boosting the effectiveness of rankers based on pre-trained language models (PLMs); this is achieved using an effective but inefficient large model to teach a more ef...
Long Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

Shuai Wang, Harrisen Scells, Bevan Koopman and Guido Zuccon.

Abstract Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the k...
Long SIGIR-2024

Evaluating Generative Ad Hoc Information Retrieval

Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guido Zuccon, Benno Stein, Matthias Hagen and Martin Potthast.

Abstract Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an info...
Resource EMNLP2024

BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina and Stéphane Clinchant.

Abstract Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve a...
Long WSDM2025

Context Embeddings for Efficient Answer Generation in RAG

David Rau*, Shuai Wang*, Hervé Déjean and Stéphane Clinchant.

Abstract Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer w...
Long ECIR2026

Starbucks: Improved Training for 2D Matryoshka Embeddings

Shengyao Zhuang*, Shuai Wang*, Bevan Koopman and Guido Zuccon.

Abstract Effective approaches that can scale embedding model depth (i.e. layers) and embedding size allow for the creation of models that are highly scalable across different computational resources and task requireme...
Reproduce Accepted SIGIR-2025

2D Matryoshka Training for Information Retrieval

Shuai Wang, Shengyao Zhuang, Bevan Koopman and Guido Zuccon.

Abstract 2D Matryoshka Training is an advanced embedding representation training approach designed to train an encoder model simultaneously across various layer-dimension setups. This method has demonstrated higher ef...
Reproduce Accepted SIGIR-2026

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin, Shuai Wang, Bevan Koopman and Guido Zuccon.

Abstract Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has shown that simple prompt injections embedded within a candidate document (jailbreak prompt attacks) can significantly alte...

talks

teaching