- Translating machine learning research into clinical practice has several challenges. In this paper, we identify some critical issues in translating research to clinical practice in the context of medical image segmentation and propose strategies to systematically address these challenges. Specifically, we focus on cases where the model yields erroneous segmentation, which we define as corner cases. One of the standard metrics used for reporting the performance of medical image segmentation algorithms is the average Dice score across all patients. We have discovered that this aggregate reporting has the inherent drawback that the corner cases where the algorithm or model has erroneous performance or very low metrics go unnoticed. Due to this reporting, models that report superior performance could end up producing completely erroneous results, or even anatomically impossible results in a few challenging cases, albeit without being noticed.We have demonstrated how corner cases goTranslating machine learning research into clinical practice has several challenges. In this paper, we identify some critical issues in translating research to clinical practice in the context of medical image segmentation and propose strategies to systematically address these challenges. Specifically, we focus on cases where the model yields erroneous segmentation, which we define as corner cases. One of the standard metrics used for reporting the performance of medical image segmentation algorithms is the average Dice score across all patients. We have discovered that this aggregate reporting has the inherent drawback that the corner cases where the algorithm or model has erroneous performance or very low metrics go unnoticed. Due to this reporting, models that report superior performance could end up producing completely erroneous results, or even anatomically impossible results in a few challenging cases, albeit without being noticed.We have demonstrated how corner cases go unnoticed using the Magnetic Resonance (MR) cardiac image segmentation task of the Automated Cardiac Diagnosis Challenge (ACDC) challenge. To counter this drawback, we propose a framework that helps to identify and report corner cases. Further, we propose a novel balanced checkpointing scheme capable of finding a solution that has superior performance even on these corner cases. Our proposed scheme leads to an improvement of 44.6% for LV, 46.1% for RV and 38.1% for the Myocardium on our identified corner case in the ACDC segmentation challenge. Further, we establish the generalisability of our proposed framework by also demonstrating its applicability in the context of chest X-ray lung segmentation. This framework has broader applications across multiple deep learning tasks even beyond medical image segmentation.…