Abstract:
There are an estimated 39 million blind persons globally, and around one-third are in
India. The accessibility of education is a critical concern for Blind and Visually Impaired
People (BVIP). Rarely do Blind and Visually impaired students opt for STEM subjects
in mid-level or higher education because of the lack of compatible education material.
The main reason is the inaccessibility of Chart Visualizations (CharVis) or Non-Textual
components (NTCs) such as charts/plots, tables, and diagrams frequently used in STEM
subjects. The inaccessibility of such visualization components exemplifies one of the rife
challenges of information access for BVIP. The BVIP, especially in developing countries
such as India, generally rely on braille, tactile, BVIP-related assistive technologies (ATs)
and/or helper for reading purposes. However, the existing tools and technologies do
not much address the accessibility concerns related to NTCs for BVIP. Detecting and
identifying the NTCs, extracting the underlying values and other data from the NTCs
and the summarization of that data are still very challenging aspects. Besides the NTCs
accessibility issue, Currency identification has always been a troublesome task for BVIP,
especially in developing countries such as India, and it has become more challenging
with the new INR currency notes. BVIP primarily relies on size variations and patterns
such as intaglio printings for recognizing the underlying currency denominations. Most
of the current Indian legal tenders resemble in size; the engraved designs are not as
distinctive as BVIP standards and fade over time. Developing automated paper currency
recognition systems is also challenging due to issues such as folded or partial views, uneven
illumination, and background clutter.
This thesis presents a detailed survey of existing systems for NTCs accessibility and
currency recognition tasks. Also, to improve the accessibility scenario, automated pipelines
for NTCs understanding and currency recognition are presented. The NTC understanding
pipeline’s first stage is to classify NTCs, and a multi-dilated and context-aggregated
dense network (MDCADNet) model is proposed in this regard. MDCADNet efficiently
caters to the larger receptive field need of the NTC classification task and performs consistently
better than the existing approaches on multiple benchmark datasets. Further,
a deep learning based effective framework focused on detecting tables (DeepDoT) in digital
documents is proposed. The quantitative and qualitative results demonstrate the
superiority of the DeepDoT framework. Following the NTCs classification and detectionrelated
tasks, a methodology for summarized content extraction from line and bar chart
is presented. Besides document accessibility, a robust framework for assisting BVIP in
currency recognition is proposed. The framework includes a lightweight Indian paper
currency recognition network (IPCRNet), useful in a resource-constrained environment, a
large and diverse INR paper currency dataset (IPCD) and a BVIP compatible interface.
The proposed method showcases a prominent performance gain over the existing currency
recognition approaches.