Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection

Attaullah Buriro; F Luccio; Muhammad Azfar Yaqub

doi:10.1145/3672608.3707800

Back

Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection

Conference proceeding

Open access

Peer reviewed

Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection

Attaullah Buriro, F Luccio and Muhammad Azfar Yaqub

SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, pp.2032-2039

40th Annual ACM ACM Symposium on Applied Computing (Catania, 31/03/2025–04/04/2025)

14/05/2025

DOI: https://doi.org/10.1145/3672608.3707800

Handle:

https://hdl.handle.net/10863/48370

Abstract

Data-driven Artificial Intelligence (D2AI)

Convolutional Neural Network

Deep neural networks

Generative adversarial networks

Malware detection

Ensuring the security of a network infrastructure necessitates the precise detection and categorization of malware. While existing methodologies have demonstrated higher accuracy, their effectiveness has predominantly been validated on a limited subset of malware families or samples. These analyses often focus on malware families with a higher number of samples, potentially leading to biased and unrepresentative classification results. To address this gap, our study aims to enhance the accuracy and robustness of malware detection and categorization systems by investigating the impact of dataset size, class balance, and data augmentation techniques on classifier performance. We demonstrate the efficacy of our approach on a comparatively larger dataset titled Blue Hexagon Open Dataset for Malware AnalysiS, comprising of 134k samples. Our analysis, exploiting 85 malware families with at least 50 samples each, results in the highest accuracy of 92.28% using Random Forest as the classifier on the original imbalanced dataset. However, by employing Generative Adversarial Networks to generate synthetic samples and achieve balanced class distributions (resulted in balanced datasets), our approach demonstrates the improvement in the classifier's accuracy to 99.35%.

Files and links (2)

pdf

3672608.37078001.36 MBDownload View

Open Access

url

https://doi.org/10.1145/3672608.3707800View

Details

Title: Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection
Creators: Attaullah Buriro - Ca' Foscari University of Venice
F Luccio - Ca' Foscari University of Venice
Muhammad Azfar Yaqub - Free University of Bozen-Bolzano
Publication Details: SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, pp.2032-2039
ISBN: 9798400706295
Conference: 40th Annual ACM ACM Symposium on Applied Computing (Catania, 31/03/2025–04/04/2025)
Publisher: ACM/SIGAPP
Format: Online
Number of pages: 8
Identifiers: 979-840070629-5
(UNIBZ)89752141
991007064075901241
Web of Science ID: 001497934400271
Scopus ID: 2-s2.0-105006449125
Copyright: This work is licensed under a Creative Commons Attribution 4.0 International License.
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Buriro A, Luccio F, Yaqub MA
Additional Description: unibz-area: Data-driven Artificial Intelligence (D2AI)
ERC: Computer systems, parallel/distributed systems, sensor networks, embedded systems, cyber-physical systems;Computer science and informatics: informatics and information systems, computer science, scientific computing, intelligent systems
ERCCODE: PE6_2 ;PE6
MIURSSD: Informatica;Telecomunicazioni;Sistemi di elaborazione delle informazioni
MIURSSDCODE: INF/01;ING-INF/03;ING-INF/05

Metrics

15 File views/ downloads

1 Record Views