arXiv 2501.04675

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

By Archita Srivastava, Abhas Kumar, et al.

Published 2025-01-08

Citation lineage

Review the prior work and downstream research connected to this paper.

Chart interpretation is crucial for visual data analysis, but accurately extracting information from charts poses significant challenges for automated models. This study investigates the fine-tuning of DEPLOT, a modality conversion module that translates the image of a plot or chart to a linearized table, on a custom dataset of 50,000 bar charts. The dataset comprises simple, stacked, and grouped bar charts, targeti…

View the original paper on arXiv