📄 NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors
👥 Authors: Lingfeng Ren, Weihao Yu, Runpeng Yu, Xinchao Wang
📅 Published: 2026-02-25
🔥 Upvotes: 1
🎯 What This Research Is About
Object hallucination is a critical issue in Large Vision-Language Models (LVLMs), where AI systems generate descriptions of objects that don't actually appear in the input image. This research investigates a fundamental question: which component is primarily responsible for these hallucinations—the vision encoder that processes images, or the language decoder that generates text?
Through systematic experiments, the researchers discovered that object hallucinations predominantly stem from strong priors in the language decoder. Based on this insight, they developed NoLan (No-Language-Hallucination Decoding), a simple training-free framework that refines output distributions by dynamically suppressing language priors based on the difference between multimodal and text-only inputs.
💡 Why This Matters
- Critical Problem Solved: Addresses object hallucinations where AI models "see" objects that don't exist in images—a major reliability issue in vision-language systems
- Training-Free Solution: Can be applied to existing models without retraining, making it immediately practical and cost-effective for deployment
- Significant Improvements: Achieves accuracy boosts of up to 6.45 points for LLaVA-1.5 7B and 7.21 points for Qwen-VL 7B on POPE benchmarks
- Root Cause Identified: Reveals that language decoder biases, not vision processing, are the primary culprit—shifting how we approach this problem
- Universal Applicability: Works across various LVLMs and different tasks, demonstrating broad effectiveness
Curated from Hugging Face daily papers • February 26, 2026