Abstract:
Convolutional Neural Networks(CNN) have achieved state-of-the-art image classification
results. The research sub-field, Explainable AI (XAI), aims to unravel the working
mechanism used by these accurate, opaque black boxes to enhance users’ trust, and detect
spurious correlations, thereby enabling the pervasive adoption of AI systems. Studies
show that humans process images in terms of sub-regions called concepts. For instance, a
peacock is identified by its green feathers, blue neck, etc. So explanations in terms of such
concepts are proven to be helpful for humans to understand the working of CNN better.
Existing approaches leverage an external repository of concept examples to extract the
concept representations learned by the CNNs. However, distributional differences that
may exist between the external repository and the data on which the CNN is trained,
the faithfulness of these explanations, i.e., if the extracted representations truly represent
the learned representations, is not guaranteed. To circumvent this challenge, the thesis
proposes three novel frameworks that automatically extract the concepts from the data.
The first framework, PACE, automatically extracts class-specific concepts relevant to the
black-box prediction. It tightly integrates the faithfulness of the explanatory framework
into the black-box model. It generates explanations for two different CNN architectures
trained for classifying the AWA2 and Imagenet-Birds datasets. Extensive human subject
experiments are conducted to validate the human interpretability and consistency of the
extracted explanations.
While class-specific concepts unravel the blueprints of a class from CNN’s perspective,
concepts are often shared across classes; for instance, gorillas and chimpanzees naturally
share many characteristics as they belong to the same family. The second framework,
SCE, unravels the concept sharedness across related classes from CNNs perspective. The
relevance of the extracted concepts towards prediction and the primitive image aspects,
like color, texture, and shape encoded by the concept, are estimated after training the
explainer, enabling it to shed light on the various concepts on which the different black
box architectures trained on the Imagenet dataset group and distinguish related classes.
The secondary focus of the thesis is to extend the fruits of explainability to allied
learning paradigms contributing to state-of-the-art image classification successes. Domain
adaptation techniques that leverage knowledge from an auxiliary source domain for
learning in labeled data-scarce target domain increase accuracy. However, the adaptation
process remains unclear, particularly the knowledge leveraged from the source domain.
The third framework XSDA-Net uses a case-based reasoning mechanism to explain the
prediction of a test instance in terms of similar-looking regions in the source and target
train images. The utility of the proposed framework is theoretically and empirically
demonstrated by curating the domain adaptation settings on datasets popularly known
to exhibit part-based explainability. Ablation analyses show the importance of each
component of the learning objective.
This thesis also provides a complete description of the XAI field, summarizing the
state-of-the-art contributions to the different types of explanations. The underlying
principle, limitations, and improvements made to these seminal contributions have also
been highlighted. Furthermore, this thesis also presents future research directions and
unexplored avenues in XAI research.