Abstract
Breast carcinoma (BC) remains one of the most common and lethal malignancies in women worldwide, making an early and accurate diagnosis a public health priority. Recent artificial intelligence (AI) research has increasingly explored multimodal fusion, with imaging, clinical records, histopathology, and genomic data to produce richer, more reliable predictions. In parallel, explainable AI (XAI) techniques aim to address a key barrier to clinical use: transparency about how deep learning models make decisions. This systematic review examines 49 peer-reviewed studies published between 2015 and 2025, following the PRISMA guidelines to analyze the landscape of multimodal learning and XAI in the diagnosis and prognosis of BC. We categorize reported fusion strategies from simple feature concatenation to advanced attention-based, gated, and hybrid architectures, designed to manage data heterogeneity and missing modalities. We also document model designs, including transfer learning, transformers, graph neural networks (GNNs), autoencoders, and ensembles, supported by preprocessing methods like stain normalization and GAN-based augmentation, and the use of XAI techniques: Grad-CAM, SHAP, and attention weights, which help bridge the gap between complex AI systems and clinical workflows. Across included studies, multimodal models often outperformed unimodal baselines; however, effect sizes varied by dataset, validation design (cross-validation versus external), and the handling of missing modalities. Our review highlights persistent open problems, including the limited availability of multimodal datasets, inconsistent benchmarks, and the scarcity of interpretable models in real world settings. Future research on BC care should focus on developing ensemble-based fusion approaches, validating them in various clinics, and embedding clinician expertise. All of which are crucial for developing AI systems that are accurate, transparent, generalizable, and trustworthy.