Abstract:
An efficient table detection process offers a solution for enterprises dealing with automated analysis of digital documents. Table detection is a challenging task due to low inter-class and high intra-class dissimilarities in document images. Further, the foreground-background class imbalance problem limits the performance of table detectors (especially single stage table detectors). The existing table detectors rely on a bottom-up scheme that efficiently captures the semantic features but fails in accounting for the resolution enriched features, thus, affecting the overall detection performance. We propose an end to end trainable framework (DeepDoT), which effectively detect the tables (of different sizes) over arbitrary scales in document images. The DeepDoT utilizes a top-down as well as a bottom-up approach, and additionally, it uses focal loss for handling the pervasive class imbalance problem for accurate predictions. We consider multiple benchmark datasets: ICDAR-2013, UNLV, ICDAR-2017 POD, and MARMOT for a thorough evaluation. The proposed approach yields comparatively better performance in terms of F1-score as compared to state-of-the-art table detection approaches.