arXiv 2501.04675

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

By Archita Srivastava, Abhas Kumar, et al.

Published 2025-01-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Chart interpretation is crucial for visual data analysis, but accurately extracting information from charts poses significant challenges for automated models. This study investigates the fine-tuning of DEPLOT, a modality conversion module that translates the image of a plot or chart to a linearized table, on a custom dataset of 50,000 bar charts. The dataset comprises simple, stacked, and grouped bar charts, targeti…

View the original paper on arXiv