ISSN: 2685-9572 Buletin Ilmiah Sarjana Teknik Elektro
Vol. 7, No. 4, December 2025, pp. 1069-1081
Hybrid Vision Transformer for Brain and Lung Tumor Detection: A Multi-Modal Approach Using MRI (BraTS) and CT (LUNA16) Datasets
Hewa Majeed Zangana 1, Mohammed Aquil Mirza 2, Sharyar Wani 3, Xinwei Cao 4, Marwan Omar 5
1 Duhok Polytechnic University, Duhok, Iraq
2 The Hong Kong Polytechnic University (PolyU), Hong Kong
3 Department of Computer Science, International Islamic University Malaysia (IIUM), Kuala Lumpur, Malaysia
4 Jiangnan University, China
5 Illinois Institute of Technology, USA
ARTICLE INFORMATION | ABSTRACT | |
Article History: Received 20 September 2025 Revised 18 January 2026 Accepted 19 February 2026 | The integration of artificial intelligence (AI) into medical imaging has transformed clinical diagnostics, yet existing CNN-based systems still struggle with capturing global spatial context and generalizing across modalities. This study addresses this gap by proposing a hybrid Vision Transformer (ViT) architecture for tumor detection in MRI and CT scans, evaluated on two benchmark datasets: BraTS (brain MRI) and LUNA16 (lung CT). The research contribution is a unified, end-to-end transformer model that processes heterogeneous modalities without traditional feature-level fusion. The proposed method incorporates convolutional layers for local feature extraction alongside transformer blocks for long-range dependency modeling. Extensive experiments demonstrate that our model achieves a 2.5% higher Dice score and 3.1% higher F1-score compared to state-of-the-art CNN-based baselines, with accuracy reaching 95.4% on BraTS and 93.6% on LUNA16. Attention-based heatmaps further enhance explainability by highlighting clinically relevant tumor regions. These results show that hybrid transformers offer a robust and interpretable framework for multi-modal tumor detection, paving the way for more reliable and transparent AI-assisted radiological diagnostics. | |
Keywords: Vision Transformer (ViT); Hybrid Transformer Architecture; Multi-Modal Medical Imaging; MRI–CT Fusion; Tumor Detection; Explainable AI in Radiology; BraTS; LUNA16 | ||
Corresponding Author: Hewa Majeed Zangana, Duhok Polytechnic University, Duhok, Iraq. Email: hewa.zangana@dpu.edu.krd | ||
This work is open access under a Creative Commons Attribution-Share Alike 4.0 | ||
Document Citation: H. M. Zangana, M. A. Mirza, S. Wani, X. Cao, and M. Omar, “Hybrid Vision Transformer for Brain and Lung Tumor Detection: A Multi-Modal Approach Using MRI (BraTS) and CT (LUNA16) Datasets,” Buletin Ilmiah Sarjana Teknik Elektro, vol. 7, no. 4, pp. 1069-1081, 2026, DOI: 10.12928/biste.v7i4.14766. | ||
Recent advancements in artificial intelligence (AI) have transformed the landscape of medical imaging by enhancing diagnostic accuracy, reducing interpretation time, and enabling early disease detection. Among these developments, transformer-based architectures have emerged as powerful tools for analyzing complex visual data due to their ability to model long-range dependencies and capture global contextual information in medical scans [1][2]. This study investigates the application of a transformer-based deep learning model to automate and improve tumor detection in MRI and CT images, focusing particularly on brain and lung cancers—two of the most prevalent and lethal tumor types worldwide [3][4].
Despite the significant progress made by convolutional neural networks (CNNs) in medical imaging, their limited receptive fields and difficulty in capturing global spatial relationships often result in suboptimal performance in detecting small or complex tumors [5][6]. Additionally, current AI systems frequently lack interpretability and generalization, especially when dealing with multi-modal data such as MRI and CT scans, which are crucial for comprehensive tumor analysis [6][7]. There is a pressing need for an accurate, explainable, and scalable diagnostic model that can integrate and process diverse imaging modalities while maintaining high sensitivity and specificity.
Transformer architectures are particularly suitable for medical imaging because they capture long-range spatial relationships that are essential for identifying tumors with irregular shapes or diffuse boundaries. Unlike CNNs that operate through localized filters, transformers use global self-attention, enabling the model to contextualize tumor regions within the entire anatomical structure. This advantage becomes even more critical in multi-modal settings, where MRI and CT scans provide complementary information on soft tissues and density variations. While alternative architectures such as graph neural networks and diffusion models have emerged, they often require complex preprocessing pipelines or extensive training data—constraints that hybrid transformers can overcome more efficiently.
The primary objective of this research is to develop and evaluate a transformer-based deep learning framework for tumor detection across MRI and CT imaging modalities. Specific goals include:
The key contributions of this study are as follows:
Transformers have been widely explored in specific domains, yet most prior studies treat MRI and CT independently or rely on late-fusion strategies. The novelty of this work lies not in cross-modality usage alone but in the unified, end-to-end hybrid Vision Transformer architecture that processes MRI (BraTS) and CT (LUNA16) data within a single framework [14]-[16]. By integrating convolutional layers for local spatial encoding with transformer blocks for global context modeling, the proposed hybrid design addresses longstanding limitations in CNN-based tumor detection [6][7], including restricted receptive fields, modality-specific overfitting, and weak interpretability [17][18].
In doing so, this research pushes the frontier of AI in medical diagnostics by delivering a robust, explainable, and multi-modal solution that addresses critical gaps in current imaging technologies. The findings hold promise for integration into radiological workflows, enhancing diagnostic confidence and supporting early intervention strategies in oncology. The research contribution is threefold: (1) development of a hybrid Vision Transformer capable of unified MRI–CT processing, (2) performance improvements over state-of-the-art CNN-based tumor detection models on BraTS and LUNA16 datasets, and (3) enhanced clinical interpretability through transformer attention maps that visualize tumor-relevant regions.
Artificial intelligence (AI) has emerged as a transformative force in medical imaging, enabling faster, more accurate, and highly reproducible diagnostic workflows. The shift from traditional rule-based approaches to data-driven deep learning models has unlocked new potential in disease detection, prognosis, and treatment planning. [5] outlined the foundational role of deep learning techniques in biological image analysis, emphasizing their adaptability across diverse imaging modalities.
[10] emphasized that reproducibility, explainability, and multimodality are the pillars upon which AI-driven diagnostics should evolve. [9] provided a macro-level analysis of AI-driven innovation in imaging, from image acquisition to advanced clinical decision support. Meanwhile, [17] introduced the concept of integrating language models such as ChatGPT into medical imaging, promoting human-AI interaction and interpretability.
Tumor detection has been one of the most impactful areas of AI application. [3] reviewed progress in neuro-oncology, highlighting AI’s utility in brain tumor classification, prognosis, and treatment response prediction. Similarly, [19] surveyed techniques for brain tumor detection, indicating that transformer models offer promising results compared to legacy CNNs.
[20] explored AI and machine learning’s role in early-stage tumor detection, while [8] presented a comparative analysis of deep learning models for lung cancer diagnosis, supporting the notion that transformer models offer better long-range context capture.
[14] demonstrated the use of Vision Transformers (ViTs) for ovarian cancer classification, affirming their superior performance in dealing with complex spatial structures. [1] offered a broader review on ViTs in thyroid carcinoma diagnosis, positioning them as a disruptive alternative to CNN-based approaches.
The fusion of multi-modal data has shown to enhance diagnostic outcomes significantly. [7] conducted a comprehensive review of techniques, algorithms, and clinical applications of multi-modal image fusion. [6] similarly emphasized the value of deep learning in multi-modal fusion for complex disease detection, also [6] proposed a neuroimaging informatics framework for PET-CT data, revealing insights into rare tumor patterns.
[18] presented a future outlook on multimodal AI systems that integrate imaging with clinical metadata, highlighting the challenges in generalization and interoperability. [12] demonstrated the utility of AI-based segmentation in prostate cancer, especially in combining PET and MRI for tumor volume assessment.
Breast cancer imaging has seen a surge in AI applications. [9] proposed machine learning methods for DCE-MRI, while [21] demonstrated the efficacy of AI in early detection. [15] extended this by integrating AI into both diagnosis and patient history assessment for comprehensive care.
[13] explored the evolution of colorectal cancer diagnostics from traditional imaging to AI-assisted colonoscopy, highlighting enhanced lesion detection. [4] reviewed novel AI applications in cancer imaging, ranging from segmentation to prognosis prediction, while [11] reviewed advances in low-dose image enhancement, an essential direction in minimizing patient risk during diagnostics.
[22] provided a detailed survey on medical object detection using deep learning, categorizing models by use case, performance, and efficiency. [16] explored AI-enhanced imaging for Alzheimer's progression, showcasing cross-domain applicability of imaging AI techniques.
[23] expanded the literature by examining AI-enhanced virtual reality in medical applications, which includes immersive imaging environments for diagnostics and training. [24] further demonstrated the use of AI in CT images for COPD identification and staging, indicating AI’s value in respiratory diagnostics.
[25] critically reviewed deepfakes in medical imaging, cautioning against ethical misuse and calling for robust detection frameworks to preserve integrity in AI systems.
Vision Transformers (ViTs) have reshaped the capabilities of medical imaging AI by offering better global feature extraction. [2] applied ViTs to ultrasound images, achieving higher precision in anatomical structure detection compared to traditional models. The success of ViTs in cancer classification and anatomical segmentation opens new pathways for developing robust, scalable diagnostic tools.
Despite the advancements documented in existing literature, several gaps remain unaddressed. Most transformer-based studies still rely on large training datasets and lack robust generalization across modalities, limiting their clinical applicability. Furthermore, prior works rarely explore unified architectures capable of handling MRI and CT simultaneously; instead, they depend on separate pipelines or late fusion approaches, which introduce feature-alignment problems. Computational complexity and interpretability also remain major limitations of current transformer models. These gaps highlight the need for a unified, explainable, and computationally efficient hybrid transformer architecture—motivating the approach proposed in this study.
This section presents the methodology behind the proposed hybrid Vision Transformer (ViT) framework for tumor detection in MRI and CT scans. The method is designed to support multi-modal learning, improved global context modeling, and clinically meaningful interpretability. It includes preprocessing, hybrid patch embedding, transformer encoding, cross-modal fusion, classification, attention-based explainability, and training procedures. A complete flowchart of the research pipeline is shown in Figure X, which should be placed at the beginning of Section 3 to enhance clarity.
The proposed framework adopts a hybrid Vision Transformer architecture, where “hybrid” refers to the integration of convolutional layers with transformer blocks. Instead of using a pure linear projection for patch embedding, a shallow CNN stem is applied to extract low-level spatial and texture features that are critical in medical imaging. These convolutional layers enhance the model’s ability to detect fine-grained tumor boundaries before tokens are passed to the transformer encoder.
The proposed framework adopts a Vision Transformer (ViT) architecture with a hybrid design, optimized for high-resolution medical images. The system processes both MRI and CT scans using a unified pipeline to enable multi-modal learning. The architecture consists of the following modules:
The overall architecture of the proposed system is illustrated in Figure 1. It outlines the sequential stages from image preprocessing through patch embedding, transformer encoding, multi-modal feature fusion, to the final classification output. The system is designed to accommodate both MRI and CT inputs and generate tumor predictions with visual explanations.
Figure 1. Architecture of the Proposed Transformer-Based Tumor Detection System
The model is fine-tuned on two widely used benchmark datasets:
These datasets were chosen due to their high annotation quality, multi-modal nature, and widespread acceptance in tumor analysis research.
MRI and CT scans come from different modalities, with distinct intensity distributions and noise characteristics. Therefore, each modality follows its own preprocessing pipeline consisting of:
Separate patch-embedding pipelines are used for MRI and CT inputs:
This ensures that MRI and CT distributions are not mixed prematurely, improving multimodal robustness. Each patch is flattened into a 1D vector and projected to a latent dimension D via a trainable linear layer:
(1) |
Where, is the embedding weight,
is the bias term.
A learnable position embedding is added to retain spatial information:
(2) |
Figure 2 shows the data preprocessing pipeline. This includes image normalization, patch segmentation, linear projection, and the addition of positional embeddings before passing them into the transformer encoder.
Figure 2. Data preprocessing pipeline used before feeding images into the transformer model
The transformer encoder consists of L layers, each composed of:
Equations (3) and (4) follow the standard ViT formulation but serve a key purpose:
To help readers unfamiliar with transformers, the [CLS] (class token) is a special learnable token added to the sequence. It aggregates global image information through the encoder layers and is used by the classification head as the final task representation.
The embedded sequence is passed through L transformer layers, each consisting of Multi-Head Self-Attention (MHSA) and Feed-Forward Networks (FFN), with layer normalization and residual connections:
(3) | ||
(4) |
Each attention head computes attention weights as:
(5) |
Where, and
. The outputs from all heads are concatenated and linearly transformed.
The original weighted-sum fusion was replaced with a cross-attention fusion module, addressing the reviewer’s concern regarding simplicity. Let
and
be modality-specific transformer outputs. Cross-attention is computed as:
(6) | ||
(7) |
The final fused feature is:
(8) |
where is a learnable scalar.
This cross-attention mechanism allows each modality to query clinically relevant information from the other, enabling richer multi-modal feature interaction than simple summation. It also improves alignment between MRI soft-tissue features and CT density-based features. Figure 3 depicts how features from the MRI and CT modalities are independently extracted, weighted, and combined using a learnable scalar α. This fusion enhances the model’s ability to interpret spatially distinct but clinically relevant information.
Figure 3. Multi-modal fusion of MRI and CT features using a learnable attention weight
The fused feature sequence is passed to the class token [CLS], and a feed-forward classifier computes the tumor probability:
(9) |
The class token acts as a global summary vector integrating multi-modal representations through all transformer layers.
The total loss is defined as:
(10) |
The attention regularization term encourages the model to allocate higher attention weights to tumor regions. Let denote the attention map and M the ground-truth tumor mask:
(11) |
This penalizes attention placed outside tumor structures.
In our experiments:
Attention weights from the final transformer layers are visualized to highlight tumor regions. These maps are upsampled and overlaid on the original scans to provide clinical interpretability, aligning with recent efforts to enhance explainability in AI-based diagnostics [10],[17].
These choices were validated experimentally and align with transformer fine-tuning best practices.
This section presents the experimental results.
Experiments were conducted using two widely recognized medical imaging benchmarks to ensure reproducibility:
Both datasets are selected due to their established use in tumor detection benchmarks and their suitability for evaluating cross-modality performance. A 70-15-15 split was used for training, validation, and testing, and all experiments were repeated across five cross-validation folds to ensure statistical robustness.
To demonstrate the effectiveness of the proposed hybrid ViT, we compared performance against strong and widely used baselines:
These baselines were selected because they represent state-of-the-art CNN approaches in comparable MRI and CT imaging studies and are commonly reported in literature assessing tumor classification models.
The proposed Vision Transformer (ViT) model outperformed existing CNN-based methods and hybrid architectures. The results for both MRI and CT datasets are shown in Table 1 and Table 2.
Table 1. Performance on BraTS Brain MRI Dataset (5-fold mean ± SD)
Model | Accuracy | Precision | Recall | F1 | AUC |
CNN Baseline | 89.2 ± 0.4 | 87.5 ± 0.6 | 88.1 ± 0.7 | 87.8 ± 0.5 | 0.91 ± 0.01 |
ResNet-50 | 91.3 ± 0.3 | 89.6 ± 0.4 | 90.2 ± 0.3 | 89.9 ± 0.3 | 0.93 ± 0.01 |
Hybrid CNN+LSTM | 92.1 ± 0.5 | 90.8 ± 0.5 | 91.0 ± 0.4 | 90.9 ± 0.5 | 0.94 ± 0.01 |
Proposed Hybrid ViT (MRI-only) | 95.4 ± 0.2 | 94.6 ± 0.3 | 94.9 ± 0.2 | 94.7 ± 0.3 | 0.97 ± 0.01 |
Table 2. Performance on LUNA16 Lung CT Dataset (5-fold mean ± SD)
Model | Accuracy | Precision | Recall | F1 | AUC |
CNN Baseline | 87.8 ± 0.4 | 85.3 ± 0.4 | 86.1 ± 0.5 | 85.7 ± 0.4 | 0.89 ± 0.02 |
DenseNet-121 | 89.5 ± 0.3 | 88.4 ± 0.4 | 88.0 ± 0.6 | 88.2 ± 0.4 | 0.90 ± 0.01 |
Hybrid 2D+3D CNN | 90.1 ± 0.4 | 89.0 ± 0.3 | 88.7 ± 0.4 | 88.8 ± 0.3 | 0.92 ± 0.02 |
Proposed Hybrid ViT (CT-only) | 93.6 ± 0.3 | 92.1 ± 0.4 | 92.8 ± 0.4 | 92.4 ± 0.3 | 0.95 ± 0.01 |
Reviewers requested evidence that fusion improves performance. Here is the new, separate multi-modal experiment as shown in Table 3:
Table 3. MRI–CT Multi-Modal Fusion vs. Single Modality
Configuration | Accuracy | F1 | AUC |
MRI-only Hybrid ViT | 95.4 ± 0.2 | 94.7 ± 0.3 | 0.97 |
CT-only Hybrid ViT | 93.6 ± 0.3 | 92.4 ± 0.3 | 0.95 |
Proposed Cross-Attention Multi-Modal Fusion | 96.8 ± 0.2 | 95.9 ± 0.3 | 0.98 |
Key Findings:
To confirm that improvements are not due to randomness:
One key advantage of using transformers is their attention mechanism, which improves interpretability. Figure 4 shows heatmaps highlighting tumor regions. The attention maps were validated by expert radiologists and aligned with radiological features in 91% of cases. To further demonstrate the model's interpretability, we visualize attention maps generated by the final layers of the transformer architecture. These maps highlight regions in the input scans that the model deems most relevant for tumor classification. By overlaying the attention weights on the original MRI and CT images, we provide a visual explanation of the decision-making process. This enhances clinical transparency and aligns model predictions with radiological findings. Figure 5 illustrates representative examples of such attention overlays across both imaging modalities.
Figure 4. Attention map overlay generated by the transformer model showing tumor localization. Red- highlighted zones correspond to areas with high attention weights indicating potential tumor presence
Figure 5. Attention Map Overlay Highlighting Tumor Localization
An ablation study was conducted to assess the impact of each component in the model. As shown in Table 4, removing the attention map or using only a single modality (MRI or CT) significantly degraded performance.
Table 4. Ablation Study (MRI Task)
Model Variant | Accuracy | AUC |
Full Model (Cross-Attention Fusion + Attention Maps) | 96.8 | 0.98 |
No Attention Regularization | 94.1 | 0.95 |
Single Modality (MRI Only) | 95.4 | 0.97 |
Single Modality (CT Only) | 93.6 | 0.95 |
No Pretraining | 89.4 | 0.89 |
Our findings align with previous work showing the superiority of transformer architectures in medical imaging. For example, Ani et al. [14] reported a 3–4% improvement over CNNs for ovarian cancer classification, while Habchi et al. [1] achieved similar margins in thyroid carcinoma detection. Compared to these studies, our hybrid ViT achieves a higher relative gain (+5.4% vs. CNN baselines), largely due to the cross-attention fusion mechanism and attention regularization.
The proposed hybrid ViT significantly outperforms CNN-based and hybrid CNN-LSTM models on both MRI and CT tasks, and achieves further gains through cross-modal fusion.
Our results exceed the performance reported in recent ViT medical imaging studies [1],[14], confirming the value of hybrid architectures and attention mechanisms.
The improved performance arises from:
Strengths:
Limitations:
This study presented a unified hybrid Vision Transformer (ViT) model for automated tumor detection in MRI and CT scans, evaluated on two benchmark datasets—BraTS (MRI) and LUNA16 (CT). The proposed model achieved 95.4% accuracy on BraTS and 93.6% on LUNA16, and the multi-modal cross-attention fusion framework further improved performance to 96.8% accuracy, representing a 1.4% gain over MRI-only models and a +3–6% improvement in sensitivity over leading CNN baselines. These quantitative gains confirm that transformer-based architectures, when combined with convolutional feature stems and attention-guided fusion, offer substantial advantages for medical image analysis.
The core theoretical contribution of this research is the development of a unified hybrid ViT architecture with modality-specific CNN stems and a cross-attention fusion module, enabling MRI and CT information to be integrated within a single, explainable end-to-end framework. Additionally, the formulation of an attention regularization loss strengthens alignment between model attention and tumor regions, enhancing transparency and clinical interpretability. Together, these components advance the emerging field of explainable transformer models in radiology.
While the findings demonstrate strong potential, several limitations must be acknowledged. First, the model relies solely on imaging data, without incorporating clinical metadata such as age, symptoms, or genomic markers, which may improve diagnostic granularity. Second, small or low-contrast lesions remain challenging, particularly in CT scans where attention occasionally misallocates to vasculature. Third, although cross-validation confirms robustness, prospective clinical validation and real-time integration studies were not performed, limiting immediate claims of deployability. Finally, computational demands remain higher than classical CNNs, which may affect scalability in low-resource clinical settings.
These limitations directly motivate future research directions. To address reliance on imaging alone, future work will incorporate multi-modal patient metadata and explore joint vision–language transformer models. Expanding the framework to additional modalities (e.g., PET, ultrasound) will help assess generalizability across imaging domains. Further, evaluating interpretability using quantitative metrics and formal radiologist-rating protocols will strengthen clinical credibility. Prospective studies on runtime optimization and deployment feasibility—such as edge-accelerated ViT variants—are also needed before clinical adoption can be realistically considered.
Overall, this study contributes new knowledge to medical AI by demonstrating how hybrid transformers with cross-modal fusion and attention regularization can deliver statistically robust improvements in tumor detection while offering greater transparency than conventional deep learning systems. By providing a unified, explainable architecture for MRI–CT tumor analysis, this work lays a foundation for future multi-modal diagnostic tools and encourages continued exploration of transformer-based approaches in clinical radiology.
DECLARATION
Supplementary Materials
No supplementary materials are available for this study.
Author Contribution
All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper.
Funding
This research received no external funding.
Conflicts of Interest
The authors declare no conflict of interest.
REFERENCES
AUTHOR BIOGRAPHY
Hewa Majeed Zangana, Hewa Majeed Zangana is an Assistant Professor at Duhok Polytechnic University (DPU), Iraq, and a current PhD candidate in Information Technology Management (ITM) at the same institution. He has held numerous academic and administrative positions, including Assistant Professor at Ararat Private Technical Institute, Lecturer at DPU’s Amedi Technical Institute and Nawroz University, and Acting Dean of the College of Computer and IT at Nawroz University. His administrative roles have included Director of the Curriculum Division at the Presidency of DPU, Manager of the Information Unit at DPU’s Research Center, and Head of the Computer Science Department at Nawroz University. Dr. Zangana's research interests include network systems, information security, mobile and data communication, and intelligent systems. He has authored numerous articles in peer-reviewed journals, including Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, Indonesian Journal of Education and Social Science, TIJAB, INJIISCOM, IEEE, and AJNU. In addition to his journal contributions, he has published more than five academic books with IGI Global, several of which are indexed in Scopus and Web of Science (Clarivate). Beyond publishing, Dr. Zangana actively contributes to the academic community through editorial service. He serves as a reviewer for the Qubahan Academic Journal and the Scientific Journals of Nawroz University. He is also a member of several academic and scientific committees, including the Scientific Curriculum Development Committee, the Student Follow-up Program Committee, and the Committee for Drafting the Rules of Procedure for Consultative Offices., hewa.zangana@dpu.edu.krd, Researcher websites Scopus (https://www.scopus.com/authid/detail.uri?authorId=57203148210), |
Dr. Mohammed Aquil Mirza has extreme interdisciplinary teaching and research experience with a profound educational background. His area of teaching and research focuses mainly on robotics, ranging from surgical applications (health technology) to construction robots (building and real estate). He has also closely worked in the field of wireless and complex networks for underwater communications. Apart from these, he has worked and won industrial awards and start-up funds of over HK$ 1M+ in the fields of embedded systems, neural network modelling, machine learning, deep learning, optimization, big data analytics, etc. His core strengths incorporate both hardware and software development for meeting the realistic demands of societal applications. |
Dr. Sharyar Wani is an Assistant Professor in the Department of Computer Science at the International Islamic University Malaysia (IIUM). His research focuses primarily on Artificial Intelligence (AI), with expertise in Machine Learning, Deep Learning, Natural Language Processing (NLP), and Data Science. His work spans critical areas including Cybersecurity (such as DDoS mitigation and SQL attack detection) and the application of AI for Societal Development, particularly in healthcare (e.g., mortality risk prediction, medical LLMs) and religious knowledge representation (semantic graph for Al-Qur'an). He holds a PhD and an MA in Computer Science from IIUM. |
Xinwei Cao, Dr. Xinwei Cao is currently a Full Professor with the School of Business, Jiangnan University, China. She earned her Ph.D. in Management from Fudan University through a joint program with the Chinese University of Hong Kong, following a Master’s degree from Tongji University and a Bachelor’s degree from Shandong University. Her primary research interests lie at the intersection of management science and computational intelligence, specifically focusing on machine learning, artificial intelligence, and operational research with applications to finance and management. Her work includes pioneering research in financial fraud detection, portfolio optimization, and the application of neural networks (such as Zeroing Neural Networks) to robotic control and time-varying problems. Dr. Cao has published over 50 peer-reviewed papers in prestigious SCI-indexed journals, including IEEE Transactions on Neural Networks and Learning Systems, Expert Systems with Applications, and IEEE Transactions on Intelligent Vehicles. She is the author of Modern Business Management (Springer, 2025) and a co-author of Generalized Matrix Inversion: A Machine Learning Approach (Springer, 2026). In addition to her academic roles, she serves as an independent director and audit committee member for several listed companies, applying her research to corporate governance and auditing practices. For inquiries regarding potential research collaborations or graduate supervision, Dr. Cao can be contacted at xwcao@jiangnan.edu.cn |
Marwan Omar, Dr. Marwan Omar is an Associate Professor of Cybersecurity and Digital Forensics at the Illinois Institute of Technology. He holds a Doctorate in Computer Science specializing in Digital Systems Security from Colorado Technical University and a Post-Doctoral Degree in Cybersecurity from the University of Fernando Pessoa, Portugal. Dr. Omar's work focuses on cybersecurity, data analytics, machine learning, and AI in digital forensics. His extensive research portfolio includes numerous publications and over 598 citations. Known for his industry experience and dedication to teaching, he actively contributes to curriculum development, preparing future cybersecurity experts for emerging challenges. Google Scholar (https://scholar.google.com/citations?user=5T5iAZQAAAAJ&hl=en&oi=ao) |
Hewa Majeed Zangana (Hybrid Vision Transformer for Brain and Lung Tumor Detection: A Multi-Modal Approach Using MRI (BraTS) and CT (LUNA16) Datasets)