Abstract:
Deep CNNs, though have achieved the state of
the art performance in image classification tasks, remain a
black-box to a human using them. There is a growing interest
in explaining the working of these deep models to improve
their trustworthiness. In this paper, we introduce a Posthoc
Architecture-agnostic Concept Extractor (PACE) that automatically extracts smaller sub-regions of the image called concepts
relevant to the black-box prediction. PACE tightly integrates
the faithfulness of the explanatory framework to the black-box
model. To the best of our knowledge, this is the first work
that extracts class-specific discriminative concepts in a posthoc
manner automatically. The PACE framework is used to generate
explanations for two different CNN architectures trained for
classifying the AWA2 and Imagenet-Birds datasets. Extensive
human subject experiments are conducted to validate the human
interpretability and consistency of the explanations extracted by
PACE. The results from these experiments suggest that over 72%
of the concepts extracted by PACE are human interpretable.