Enhancing VLM Grounding: A Pipeline for Fine-Grained Data Synthesis and Reinforcement Learning
A deep dive into our research on improving Vision-Language Model spatial reasoning through a two-level data distillation pipeline and a customized Chain-of-Thought reward …
