| dc.description.abstract |
Performance plays an important role when selecting a DNN for decision-making. However,
in healthcare, relying on a DNN’s decision without understanding its certainty or calibration
can be risky. This thesis focuses on addressing the challenges of analyzing uncertainty
and calibration in DNNs, specifically for medical imaging tasks such as segmentation and
classification.
Uncertainty in predictions arises from data noise (aleatoric) and flawed model inferences
(epistemic). While epistemic uncertainty can be mitigated with more data or larger models,
addressing aleatoric uncertainty is more challenging. This work aims to reduce aleatoric
uncertainty in a downstream segmentation task through a two-stage approach: (a) a
self-supervised task, specifically a reconstruction task, to estimate aleatoric uncertainty
with predictions, akin to learning the output distribution, and (b) leveraging additional
samples from the learned distribution to reduce aleatoric uncertainty in the segmentation
task. Sampling from high-uncertainty regions in the reconstruction highlights areas where
the model is less confident, and incorporating these samples improves predictions. The
proposed method, evaluated on the benchmark Brain Tumor Segmentation (BraTS) dataset,
demonstrated a significant reduction in aleatoric uncertainty for segmentation task while
achieving performance comparable to or better than standard augmentation techniques.
To investigate the calibration of DNNs, this thesis focused on two key aspects: first,
understanding how confidence calibration is affected under varying conditions, and second,
improving the calibration of DNNs so that their probability scores can be reliably used for
decision-making. To address the first aspect, a comprehensive empirical study was conducted
to evaluate performance and calibration across different scenarios. The experiments involved
combinations of three medical imaging datasets, four dataset sizes (ranging from small to
large), three architecture sizes (small to large), and three training regimes (fully supervised
and self-supervised, with and without pretraining). Additionally, the study examined
the factors within DNNs that influence changes in calibration. Key findings include: (a)
self-supervised learning improves calibration without compromising performance, (b) dataset
characteristics significantly impact both calibration and performance, and (c) employing
multiple calibration metrics is crucial for a comprehensive evaluation of calibration error,
as relying on a single metric can lead to misleading conclusions.
To improve the calibration of DNNs, various methods have been proposed, ranging from
post-hoc adjustments to train-time strategies. However, these approaches often come at the
cost of reduced performance. Moreover, many techniques focus on improving calibration
for the most confident predicted class rather than addressing calibration across all classes.
This thesis aims to improve class-wise calibration without compromising performance. To
achieve this, a novel method called Label Smoothing Plus (LS+) was introduced. LS+
incorporates class-wise priors, estimated from validation set accuracies, during training
to produce better-calibrated predictions. The proposed approach was evaluated on three
benchmark medical imaging datasets, including one with significant class imbalance,
using multiple performance and calibration metrics across two architectures. The results
demonstrated a significant reduction in miscalibration, with the predicted confidence scores
proving highly suitable for clinical decision-making. |
en_US |