Zangana, Hewa Majeed and Mirza, Mohammed Aquil and Wani, Sharyar and Cao, Xinwei (2026) Hybrid Vision Transformer for Brain and Lung Tumor Detection: A Multi-Modal Approach Using MRI (BraTS) and CT (LUNA16) Datasets. Buletin Ilmiah Sarjana Teknik Elektro, 7 (4). pp. 1069-1081.
14766-Article Text-73002-1-10-20260219.pdf - Published Version
Download (830kB)
Abstract
The integration of artificial intelligence (AI) into medical imaging has transformed clinical diagnostics, yet existing CNN-based systems still struggle with capturing global spatial context and generalizing across modalities. This study addresses this gap by proposing a hybrid Vision Transformer (ViT) architecture for tumor detection in MRI and CT scans, evaluated on two benchmark datasets: BraTS (brain MRI) and LUNA16 (lung CT). The research contribution is a unified, end-to-end transformer model that processes heterogeneous modalities without traditional feature-level fusion. The proposed method incorporates convolutional layers for local feature extraction alongside transformer blocks for long-range dependency modeling. Extensive experiments demonstrate that our model achieves a 2.5% higher Dice score and 3.1% higher F1-score compared to state-of-the-art CNN-based baselines, with accuracy reaching 95.4% on BraTS and 93.6% on LUNA16. Attention-based heatmaps further enhance explainability by highlighting clinically relevant tumor regions. These results show that hybrid transformers offer a robust and interpretable framework for multi-modal tumor detection, paving the way for more reliable and transparent AI-assisted radiological diagnostics.
| Item Type: | Article |
|---|---|
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
| Depositing User: | BISTE UAD |
| Date Deposited: | 16 May 2026 16:37 |
| Last Modified: | 16 May 2026 16:37 |
| URI: | https://alxiv.org/id/eprint/843 |
