Zero-shot Generative Large Language Models for Systematic Review Screening Automation

Published in ECIR-2024, 2023

Recommended citation: Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman and Guido Zuccon. 2023. Zero-shot Generative Large Language Models for Systematic Review Screening Automation. In Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024). https://link.springer.com/chapter/10.1007/978-3-031-56027-9_25

Abstract

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models (LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs, and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches