Howdy! I’m Shuai Wang, a third-year PhD student who’s been swimming in the deep end of academia for what feels like forever. I’m under the eagle eyes of Professor. Guido Zuccon, A.Professor. Bevan Koopman, and Dr. Harrisen Scells.
I received my Bachelor of Science degree from the University of Western Australia in 2019, then obtained a Master of Engineering Science degree from the University of Queensland in 2021.
My research is strongly related to information retrieval and natural language processing (NLP). My PhD topic focuses on domain-specific applications (automation for medical systematic reviews), where I have been currently focused on the development of Automatic Mesh Term Suggestion, Screening Prioritisation, Seed-driven methods and Boolean query formulation. I also do some side projects on general IR and NLP tasks, such as Federated RAG, Fusion of Rankers etc.
I’m also a tutor at UQ, teaching courses like INFS7410 (information retrieval and web search). I’m not in it for the cash but for the chance to scout for other brainiacs to collaborate with. So, if you’re interested, let’s chat and see if we can cook up some research magic together!
I conducted my internship at Naver Lab Europe Feb-July 2024, with a research focus on Context Compression on Retrieval-augmented generation (RAG).
Job Opportunities
I’m expected to graduate at the end of 2024, and I’m currently looking for job opportunities in academia and industry. If you think I’m a good fit for your team, please feel free to contact me.
News
Paper Accepted in WSDM
[Full Paper]: Context Embeddings for Efficient Answer Generation in RAG
Three papers Accepted in SIGIR2024
[Resource Paper]: FeB4RAG; [Short Paper]: LLM for stemming; [Perspective Paper]: Evaluating ad-hoc generative IR;
Paper Presentation at ECIR, Glasgow
Presented a paper Zero-shot Zero-shot Generative Large Language Models for Systematic Review Screening Automation.
Services
I serve as a reviewer/PC member for the following journal/conference:
- TOIS: ACM Transactions on Information Systems
- Journal of Data and Information Quality
- ACM ICTIR 2023, SIGIR2024
Publications
Starbucks: Improved Training for 2D Matryoshka Embeddings Long
Shengyao Zhuang*, Shuai Wang*, Bevan Koopman and Guido Zuccon. 2024. Starbucks: Improved Training for 2D Matryoshka Embeddings. (Arxiv Preprint).
Context Embeddings for Efficient Answer Generation in RAG Long
David Rau*, Shuai Wang*, Hervé Déjean and Stéphane Clinchant. 2024. Context Embeddings for Efficient Answer Generation in RAG. (Accepted in WSDM2025).
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation Resource
David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina and Stéphane Clinchant. 2024. BERGEN: A Benchmarking Library for Retrieval-Augmented Generation. (Accepted in EMNLP2024 Findings).
Large Language Models for Stemming: Promises, Pitfalls and Failures Short
Shuai Wang, Shengyao Zhuang and Guido Zuccon. 2024. Large Language Models for Stemming: Promises, Pitfalls and Failures. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024).
Evaluating Generative Ad Hoc Information Retrieval Long
Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guido Zuccon, Benno Stein, Matthias Hagen and Martin Potthast. 2024. Evaluating Generative Ad Hoc Information Retrieval. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024).
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation Resource
Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang and Guido Zuccon. 2024. FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024).
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search Long
Shuai Wang, Shengyao Zhuang, Bevan Koopman and Guido Zuccon. 2024. ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search. (Arxiv Preprint).
Zero-shot Generative Large Language Models for Systematic Review Screening Automation Long
Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman and Guido Zuccon. 2023. Zero-shot Generative Large Language Models for Systematic Review Screening Automation. In Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024).
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation Long
Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman and Guido Zuccon. 2023. Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation. In Proceedings of the international ACM SIGIR Conference on Information Retrieval in the Asia Pacific November 26-29, 2023 (SIGIR-AP 2023).
Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search? Long
Shuai Wang, Harrisen Scells, Bevan Koopman and Guido Zuccon. 2023. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search? In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023).
Balanced Topic Aware Sampling for Effective Dense Retriever: A Reproducibility Study Reproduce
Shuai Wang, and Guido Zuccon. 2023. Balanced Topic Aware Sampling for Effective Dense Retriever: A Reproducibility Study. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023).
MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction Short
Shuai Wang and Hang Li and Guido Zuccon. 2023. MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction. In the 16th Web Search and Data Mining Conference WSDM 2023.
Neural Rankers for Effective Screening Prioritization in Medical Systematic Review Literature Search Long
Shuai Wang and Harry Scells and Bevan Koopman and Guido Zuccon. 2022. Neural Rankers for Effective Screening Prioritization in Medical Systematic Review Literature Search. In Australasian Document Computing Symposium (ADCS 2022).
Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search Journal
Shuai Wang and Harry Scells and Bevan Koopman and Guido Zuccon. 2022. Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search. In Intelligent Systems with Applications (ISWA) Technology-Assisted Review Systems Special Issue.
To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers Short
Hang Li* and Shuai Wang* and Shengyao Zhuang and Ahmed Mourad and xueguang-ma and jimmy-lin and Guido Zuccon. 2022. To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022).
From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search Resource
Shuai Wang and Harry Scells and Justin Clark and Guido Zuccon and Bevan Koopman. 2022. From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022).
SDR for Systematic Reviews: A Reproducibility Study Reproduce
Shuai Wang and Harry Scells and Ahmed Mourad and Guido Zuccon. 2022. SDR for Systematic Reviews: A Reproducibility Study. In Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022).
MeSH Term Suggestion for Systematic Review Literature Search Long
Shuai Wang and Hang Li and Harry Scells and Daniel Locke and Guido Zuccon. 2021. MeSH Term Suggestion for Systematic Review Literature Search. In Australasian Document Computing Symposium (ADCS 2021).
IELAB at TREC Deep Learning Track 2021 Notebook
Shengyao Zhuang and Hang Li and Shuai Wang and Guido Zuccon. 2021. IELAB at TREC Deep Learning Track 2021. In TREC 2021 Deep Learning Track.
BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval Short
Shuai Wang and Shengyao Zhuang and Guido Zuccon. 2021. BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval. In The Proceedings of the 2021 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR 2021).