Trustworthy deep learning for medical image analysis: Exploring uncertainty and calibration

Sambyal, A. S.

DSpace Home
→
Ph.D Theses
→
Year- 2025
→
View Item

dc.contributor.author	Sambyal, A. S.
dc.date.accessioned	2025-11-08T18:16:49Z
dc.date.available	2025-11-08T18:16:49Z
dc.date.issued	2025-04-11
dc.identifier.uri	http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4971
dc.description.abstract	Performance plays an important role when selecting a DNN for decision-making. However, in healthcare, relying on a DNN’s decision without understanding its certainty or calibration can be risky. This thesis focuses on addressing the challenges of analyzing uncertainty and calibration in DNNs, specifically for medical imaging tasks such as segmentation and classification. Uncertainty in predictions arises from data noise (aleatoric) and flawed model inferences (epistemic). While epistemic uncertainty can be mitigated with more data or larger models, addressing aleatoric uncertainty is more challenging. This work aims to reduce aleatoric uncertainty in a downstream segmentation task through a two-stage approach: (a) a self-supervised task, specifically a reconstruction task, to estimate aleatoric uncertainty with predictions, akin to learning the output distribution, and (b) leveraging additional samples from the learned distribution to reduce aleatoric uncertainty in the segmentation task. Sampling from high-uncertainty regions in the reconstruction highlights areas where the model is less confident, and incorporating these samples improves predictions. The proposed method, evaluated on the benchmark Brain Tumor Segmentation (BraTS) dataset, demonstrated a significant reduction in aleatoric uncertainty for segmentation task while achieving performance comparable to or better than standard augmentation techniques. To investigate the calibration of DNNs, this thesis focused on two key aspects: first, understanding how confidence calibration is affected under varying conditions, and second, improving the calibration of DNNs so that their probability scores can be reliably used for decision-making. To address the first aspect, a comprehensive empirical study was conducted to evaluate performance and calibration across different scenarios. The experiments involved combinations of three medical imaging datasets, four dataset sizes (ranging from small to large), three architecture sizes (small to large), and three training regimes (fully supervised and self-supervised, with and without pretraining). Additionally, the study examined the factors within DNNs that influence changes in calibration. Key findings include: (a) self-supervised learning improves calibration without compromising performance, (b) dataset characteristics significantly impact both calibration and performance, and (c) employing multiple calibration metrics is crucial for a comprehensive evaluation of calibration error, as relying on a single metric can lead to misleading conclusions. To improve the calibration of DNNs, various methods have been proposed, ranging from post-hoc adjustments to train-time strategies. However, these approaches often come at the cost of reduced performance. Moreover, many techniques focus on improving calibration for the most confident predicted class rather than addressing calibration across all classes. This thesis aims to improve class-wise calibration without compromising performance. To achieve this, a novel method called Label Smoothing Plus (LS+) was introduced. LS+ incorporates class-wise priors, estimated from validation set accuracies, during training to produce better-calibrated predictions. The proposed approach was evaluated on three benchmark medical imaging datasets, including one with significant class imbalance, using multiple performance and calibration metrics across two architectures. The results demonstrated a significant reduction in miscalibration, with the predicted confidence scores proving highly suitable for clinical decision-making.	en_US
dc.language.iso	en_US	en_US
dc.subject	Trustworthy deep learning	en_US
dc.subject	Uncertainty quantification	en_US
dc.subject	Confidence calibration	en_US
dc.title	Trustworthy deep learning for medical image analysis: Exploring uncertainty and calibration	en_US
dc.type	Thesis	en_US