Abstract:
Chart Data Extraction (CDE) is a complex task in document analysis that involves extracting data from charts to facilitate accessibility for various applications, such as document mining, medical
diagnosis, and accessibility for the visually impaired. CDE is challenging due to the intricate structure and specific semantics of
charts, which include elements such as title, axis, legend, and plot
elements. The existing solutions for CDE have not yet satisfactorily addressed these issues. In this paper, we focus on two critical
subtasks in CDE, Legend Analysis and Axis Analysis, and present a
lightweight YOLO-based method for detection and domain-specific
heuristic algorithms (Axis Matching and Legend Matching), for
matching. We evaluate the efficacy of our proposed method, LYLAA, on a real-world dataset, the ICPR2022 UB PMC dataset, and
observe promising results compared to the competing teams in the
ICPR2022 CHART-Infographics competition. Our findings showcase the potential of our proposed method in the CDE process.