Hybrid Vision Transformer for Brain and Lung Tumor Detection: A Multi-Modal Approach Using MRI (BraTS) and CT (LUNA16) Datasets

Hewa Majeed Zangana; Mohammed Aquil Mirza; Sharyar Wani; Xinwei Cao; Marwan Omar

doi:10.12928/biste.v7i4.14766

Authors

Hewa Majeed Zangana Duhok Polytechnic University https://orcid.org/0000-0001-7909-254X
Mohammed Aquil Mirza The Hong Kong Polytechnic University (PolyU)
Sharyar Wani International Islamic University Malaysia (IIUM)
Xinwei Cao Jiangnan University
Marwan Omar Illinois Institute of Technology

DOI:

https://doi.org/10.12928/biste.v7i4.14766

Keywords:

Vision Transformer (ViT), Hybrid Transformer Architecture, Multi-Modal Medical Imaging, MRI–CT Fusion, Tumor Detection, Explainable AI in Radiology, BraTS, LUNA16

Abstract

The integration of artificial intelligence (AI) into medical imaging has transformed clinical diagnostics, yet existing CNN-based systems still struggle with capturing global spatial context and generalizing across modalities. This study addresses this gap by proposing a hybrid Vision Transformer (ViT) architecture for tumor detection in MRI and CT scans, evaluated on two benchmark datasets: BraTS (brain MRI) and LUNA16 (lung CT). The research contribution is a unified, end-to-end transformer model that processes heterogeneous modalities without traditional feature-level fusion. The proposed method incorporates convolutional layers for local feature extraction alongside transformer blocks for long-range dependency modeling. Extensive experiments demonstrate that our model achieves a 2.5% higher Dice score and 3.1% higher F1-score compared to state-of-the-art CNN-based baselines, with accuracy reaching 95.4% on BraTS and 93.6% on LUNA16. Attention-based heatmaps further enhance explainability by highlighting clinically relevant tumor regions. These results show that hybrid transformers offer a robust and interpretable framework for multi-modal tumor detection, paving the way for more reliable and transparent AI-assisted radiological diagnostics.

References

Y. Habchi, H. Kheddar, Y. Himeur, and M. C. Ghanem, “Machine learning and transformers for thyroid carcinoma diagnosis: A review,” arXiv preprint arXiv:2403.13843, 2024, https://doi.org/10.1016/j.jvcir.2025.104668.

M. Vafaeezadeh, H. Behnam, and P. Gifani, “Ultrasound image analysis with vision transformers,” Diagnostics, vol. 14, no. 5, p. 542, 2024, https://doi.org/10.3390/diagnostics14050542.

S. Khalighi, K. Reddy, A. Midya, K. B. Pandav, A. Madabhushi, and M. Abedalthagafi, “Artificial intelligence in neuro-oncology: advances and challenges in brain tumor diagnosis, prognosis, and precision treatment,” NPJ Precis Oncol, vol. 8, no. 1, p. 80, 2024, https://doi.org/10.1038/s41698-024-00575-0.

M. Pallumeera, J. C. Giang, R. Singh, N. S. Pracha, and M. S. Makary, “Evolving and Novel Applications of Artificial Intelligence in Cancer Imaging,” Cancers (Basel), vol. 17, no. 9, p. 1510, 2025, https://doi.org/10.3390/cancers17091510.

C. Matsoukas, J. F. Haslum, M. Sorkhei, M. Söderberg, and K. Smith, “Pretrained vits yield versatile representations for medical images,” arXiv preprint arXiv:2303.07034, 2023, https://doi.org/10.48550/arXiv.2303.07034.

S. K. Agrawal, I. P. Dubey, A. K. Nair, A. Jain, A. Mahato, and R. Kumar, “Neuroimaging informatics framework for analysing rare brain metastasis patterns in pleural mesothelioma using hybrid PET CT,” Neuroscience Informatics, p. 100207, 2025, https://doi.org/10.1016/j.neuri.2025.100207.

M. Zubair, M. Hussai, M. A. Al-Bashrawi, M. Bendechache, and M. Owais, “A Comprehensive Review of Techniques, Algorithms, Advancements, Challenges, and Clinical Applications of Multi-modal Medical Image Fusion for Improved Diagnosis,” arXiv preprint arXiv:2505.14715, 2025, https://doi.org/10.1016/j.cmpb.2025.109014.

H.T. Gayap and M. A. Akhloufi, “Deep machine learning for medical diagnosis, application to lung cancer detection: a review,” BioMedInformatics, vol. 4, no. 1, pp. 236-284, 2025, https://doi.org/10.3390/biomedinformatics4010015.

A. M. Freire et al. “Clinical Annotation and Medical Image Anonymization for AI Model Training in Lung Cancer Detection,” In International Conference on Human-Computer Interaction, pp. 309-325, 2025, https://doi.org/10.1007/978-3-031-93848-1_21.

P. Khosravi, T. J. Fuchs, and D. J. Ho, “Artificial Intelligence–Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality,” Cancer Res, vol. 85, no. 13, pp. 2356–2367, 2025, https://doi.org/10.1158/0008-5472.CAN-24-3630.

A. Clement David-Olawade et al., “AI-Driven Advances in Low-Dose Imaging and Enhancement—A Review,” Diagnostics, vol. 15, no. 6, p. 689, 2025, https://doi.org/10.3390/diagnostics15060689.

S. Usmani et al., “Deep learning (DL)‐based advancements in prostate cancer imaging: Artificial intelligence (AI)‐based segmentation of 68Ga‐PSMSA PET for tumor volume assessment,” Precis Radiat Oncol, vol. 9, no. 2, pp. 120-132, 2025, https://doi.org/10.1002/pro6.70014.

D.-D. Chitca, V. Popescu, A. Dumitrescu, C. Botezatu, and B. Mastalier, “Advancing Colorectal Cancer Diagnostics from Barium Enema to AI-Assisted Colonoscopy,” Diagnostics, vol. 15, no. 8, p. 974, 2025, https://doi.org/10.3390/diagnostics15080974.

S. R. Ani et al., “Towards Classification of Ovarian Cancer: A Vision Transformer Model,” in 2024 27th International Conference on Computer and Information Technology (ICCIT), pp. 2665–2670, 2024, https://doi.org/10.1109/ICCIT64611.2024.11022029.

M. Fathima and M. Moulana, “Revolutionizing breast cancer care: AI-enhanced diagnosis and patient history,” Comput Methods Biomech Biomed Engin, vol. 28, no. 5, pp. 642–654, 2025, https://doi.org/10.1080/10255842.2023.2300681.

A. Chaudhari, S. Saratkar, and T. Thute, “AI-Enhanced Imaging Techniques for Understanding Alzheimer’s Progression,” in 2025 International Conference on Machine Learning and Autonomous Systems (ICMLAS), pp. 1174–1179, 2025, https://doi.org/10.1109/ICMLAS64557.2025.10969042.

M. Hu, J. Qian, S. Pan, Y. Li, R. L. J. Qiu, and X. Yang, “Advancing medical imaging with language models: featuring a spotlight on ChatGPT,” Phys Med Biol, vol. 69, no. 10, p. 10TR01, 2024, https://doi.org/10.1088/1361-6560/ad387d.

B. D. Simon, K. B. Ozyoruk, D. G. Gelikman, S. A. Harmon, and B. Türkbey, “The future of multimodal artificial intelligence models for integrating imaging and clinical metadata: A narrative review,” Diagn. Interv. Radiol, vol. 31, no. 4, p. 303, 2024, https://doi.org/10.4274/dir.2024.242631.

U. U. Salunke and B. R. Mote, “Brain Tumor Detection: Recent Advances and Technique,” Harnessing AI and Machine Learning for Precision Wellness, pp. 431–456, 2025, https://doi.org/10.4018/979-8-3693-9521-9.ch016.

B. R. Mote, “Brain Tumor Detection,” Harnessing AI and Machine Learning for Precision Wellness, p. 431, 2025, https://doi.org/10.4018/979-8-3693-9521-9.ch016.

V. Deendyal, L. Ghazaryan, E. Linden, L. Allen, and N. G. Thaker, “Artificial Intelligence for Early Breast Cancer Detection,” AI in Precision Oncology, vol. 2, no. 1, pp. 33-46, 2025, https://doi.org/10.1089/aipo.2024.0051.

M. Saraei, M. Lalinia and E. -J. Lee, "Deep Learning-Based Medical Object Detection: A Survey," in IEEE Access, vol. 13, pp. 53019-53038, 2025, https://doi.org/10.1109/ACCESS.2025.3553087.

Y. Wu, K. Hu, D. Z. Chen, and J. Wu, “Ai-enhanced virtual reality in medicine: A comprehensive survey,” arXiv preprint arXiv:2402.03093, 2024, https://doi.org/10.48550/arXiv.2402.03093.

Y. Wu, S. Xia, Z. Liang, R. Chen, and S. Qi, “Artificial intelligence in COPD CT images: identification, staging, and quantitation,” Respir Res, vol. 25, no. 1, p. 319, 2024, https://doi.org/10.1186/s12931-024-02913-z.

P. Pradepan, “A comprehensive review of deepfakes in medical imaging: Ethical concerns, detection techniques and future directions,” Applied Computer Science, vol. 21, no. 2, pp. 139–153, 2025, https://doi.org/10.35784/acs_7054.