A Vision Transformer Architecture for Automated Recognition of Parasitic Types in Microscopic Images

Pamungkas, Yuri and Triandini, Evi and Karim, Abdul and Sangsawang, Thosporn (2026) A Vision Transformer Architecture for Automated Recognition of Parasitic Types in Microscopic Images. International Journal of Robotics and Control Systems, 6 (1). pp. 383-400.

[thumbnail of 2247-8498-2-PB.pdf] Text
2247-8498-2-PB.pdf - Published Version

Download (1MB)

Abstract

Parasitic infections continue to pose a major global health challenge, with diagnosis still largely dependent on manual microscopic examination. Although CNNs have been applied to automate parasite detection, they are limited in capturing global context, which is crucial for distinguishing subtle morphological differences. To overcome these limitations, this study introduces a Vision Transformer (ViT) architecture for automated recognition of multiple parasite species and host cells in microscopic images. The proposed approach was evaluated on a dataset of 34,298 images across eight classes, including Babesia, Leishmania, Leukocyte, Plasmodium, Red Blood Cells (RBCs), Toxoplasma, Trichomonad, and Trypanosome. Images were preprocessed and augmented before being transformed into patch embeddings and passed through a series of transformer encoding modules employing multi-headed self-attention mechanisms to capture contextual dependencies across the image patches. A classification head produced predictions, while interpretability was examined using Grad-CAM and Score-CAM. Results show that the ViT model achieved excellent performance, with an accuracy of 99.70%, precision of 99.46%, recall of 99.40%, specificity of 99.60%, and F1-score of 99.45%. Confusion matrix analysis confirmed reliable predictions across all classes, and ROC curves yielded AUC values close to 1.0. Visualization demonstrated that the model consistently focused on biologically meaningful features, with Score-CAM offering sharper localization compared to Grad-CAM. In conclusion, the proposed ViT architecture provides a highly accurate and interpretable framework for parasite recognition, demonstrating strong potential to improve diagnostic workflows and support reliable clinical decision-making.

Item Type: Article
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Depositing User: IJRCS ASCEE
Date Deposited: 28 Apr 2026 07:44
Last Modified: 28 Apr 2026 07:44
URI: https://alxiv.org/id/eprint/139

Actions (login required)

View Item
View Item