PhishViT:A Vision Transformer-Based Framework for Real-Time Phishing Detection from Webpage Screenshots

Jean chrysostome NDAYISABYE

PhishViT:A Vision Transformer-Based Framework for Real-Time Phishing Detection from Webpage Screenshots

dc.contributor.author	Jean chrysostome NDAYISABYE
dc.date.accessioned	2026-04-13T08:01:19Z
dc.date.issued	2026
dc.description.abstract	Abstract: Phishing attacks represent one of the most pervasive and economically devastating cyber threats, with conventional detection systems relying primarily on URL lexical analysis, DNS inspection, and HTML source-code heuristics. These text-centric approaches share a fundamental blind spot: they do not examine the visual rendering of webpages as perceived by human users, leaving a critical detection gap exploited by visual-layer spoofing attacks. This paper presents PhishViT, a Vision Transformer- based framework for real-time phishing detection that operates exclusively on webpage screenshots. Unlike methods that analyze URL strings or page source code, PhishViT learns discriminative visual representations directly from rendered webpage images using a fine-tuned Data-efficient Image Transformer (DeiT-Small) architecture. An automated Playwright browser pipeline captures live screenshots which are classified as phishing or legitimate, with interpretable attention rollout heatmaps generated for each prediction. The framework is developed through an iterative three-phase process, starting from an initial prototype (V1: 253 screenshots, 78.95% accuracy), expanding to a balanced dataset (V2: 642 screenshots, 96.91% accuracy), and culminating in a rigorous top-tier evaluation framework (V3) with comprehen- sive baseline comparison against ResNet50, EfficientNet-B0, and ViT-Base; 5-fold cross-validation confirming 85.23%±1.18% accuracy; ablation study validating each design choice; and ro- bustness evaluation under six visual perturbation conditions. V3 evaluation demonstrates that DeiT-Small achieves 91.75% accu- racy, 93.48% precision, 89.58% recall, 91.49% F1-score, and AUC-ROC of 0.9928 at only 5.44 ms inference latency, outper- forming EfficientNet-B0 and ViT-Base while achieving the best efficiency-accuracy trade-off. These results establish PhishViT as a viable, interpretable, and computationally efficient visual-layer phishing detection framework suitable for real-time browser ex- tension deployment.
dc.description.provenance	Submitted by Jean Chrysostome NDAYISABYE (ndayisabyejeanchrysostome@gmail.com) on 2026-04-10T17:11:29Z workflow start=Step: reviewstep - action:claimaction No. of bitstreams: 2 PhishViT.pdf: 969931 bytes, checksum: 0164c492f89d1713b81479f201d320c9 (MD5) license_rdf: 1025 bytes, checksum: 5fbab3a8de1b8b11fce4c9bca21b0aab (MD5)	en
dc.description.provenance	Step: reviewstep - action:reviewaction Approved for entry into archive by Jo Havemann (jo@africarxiv.org) on 2026-04-13T08:01:19Z (GMT)	en
dc.description.provenance	Made available in DSpace on 2026-04-13T08:01:19Z (GMT). No. of bitstreams: 2 PhishViT.pdf: 969931 bytes, checksum: 0164c492f89d1713b81479f201d320c9 (MD5) license_rdf: 1025 bytes, checksum: 5fbab3a8de1b8b11fce4c9bca21b0aab (MD5) Previous issue date: 2026	en
dc.identifier.uri	https://africarxiv.ubuntunet.net/handle/1/11307
dc.language.iso	en
dc.rights	Attribution 3.0 United States	en
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/
dc.subject	Phishing Detection
dc.subject	Vision Transformer
dc.subject	DeiT-Small
dc.subject	Screenshot Classification
dc.subject	Cybersecurity
dc.subject	Deep Learning
dc.subject	Attention Rollout
dc.subject	Cross-Validation
dc.subject	Robustness Evaluation.
dc.title	PhishViT:A Vision Transformer-Based Framework for Real-Time Phishing Detection from Webpage Screenshots
dc.type	Working Paper

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhishViT.pdf
Size:: 947.2 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Working Paper