UPDF AI

MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning

Zaifu Zhan,Jun Wang,2 作者,Rui Zhang

2025 · DOI: 10.48550/arXiv.2502.15954
JAMIA Journal of the American Medical Informatics Association · 引用数 6

TLDR

MMRAG effectively enhances biomedical in-context learning by refining example selection, mitigating data scarcity issues, and demonstrating superior adaptability for NLP-driven healthcare applications.

摘要

OBJECTIVES To optimize in-context learning in biomedical natural language processing by improving example selection. MATERIALS AND METHODS We introduce a novel multi-mode retrieval-augmented generation (MMRAG) framework, which integrates 4 retrieval strategies: (1) Random Mode, selecting examples arbitrarily; (2) Top Mode, retrieving the most relevant examples based on similarity; (3) Diversity Mode, ensuring variation in selected examples; and (4) Class Mode, selecting category-representative examples. This study evaluates MMRAG on 3 core biomedical NLP tasks: Named Entity Recognition (NER), Relation Extraction (RE), and Text Classification (TC). The datasets used include BC2GM for gene and protein mention recognition (NER), DDI for drug-drug interaction extraction (RE), GIT for general biomedical information extraction (RE), and HealthAdvice for health-related text classification (TC). The framework is tested with 2 large language models (Llama-2-7B and Llama-3-8B) and 3 retrievers (Contriever, MedCPT, and BGE-Large) to assess performance across different retrieval strategies. RESULTS The results from the Random Mode indicate that providing more examples in the prompt improves the model's generation performance. Meanwhile, Top Mode and Diversity Mode significantly outperform Random Mode on the RE (DDI) task, achieving an F1 score of 0.9669-a 26.4% improvement. Among the 3 retrievers tested, Contriever outperformed the other 2 in a greater number of experiments. Additionally, Llama 2 and Llama 3 demonstrated varying capabilities across different tasks, with Llama 3 showing a clear advantage in handling NER tasks. CONCLUSION MMRAG effectively enhances biomedical in-context learning by refining example selection, mitigating data scarcity issues, and demonstrating superior adaptability for NLP-driven healthcare applications.

参考文献
引用文献