Lightweight Ensemble Models for Static Malware Detection: AddressingDeep Learning Trade-Offs with the Kaggle PE Dataset
DOI:
https://doi.org/10.71129/ijaci.v2i1.pp21-33Keywords:
Static Malware Detection, Ensemble Model, Classification, Feature Selection, InterpretabilityAbstract
Static malware detection plays a crucial role in cybersecurity by enabling the identification of malicious files without the need to execute them. This study explores the effectiveness of lightweight ensemble models as an efficient and interpretable alternative to deep learning approaches. Using the Kaggle PE Malware Dataset, eight classifiers including Random Forest, Extra Trees, Gradient Boosting, LightGBM, XGBoost, CatBoost, HistGradientBoosting, and Multilayer Perceptron were evaluated, with a one-dimensional Convolutional Neural Network serving as a performance benchmark. To address class imbalance, three oversampling methods were applied separately, namely Synthetic Minority Oversampling Technique, Adaptive Synthetic Sampling, and SMOTE combined with Edited Nearest Neighbors. The dataset was normalized, and SelectKBest was used to select the 20 most informative features. All models were tuned using the Optuna framework to ensure optimal performance. Experimental results showed that the Voting Classifier achieved perfect accuracy, precision, recall, and F1-score on the validation set when trained with SMOTE and the full feature set and retained nearly identical results using only the top 20 features. While the CNN baseline required more computational resources, it did not outperform the optimized ensemble models. Furthermore, SHAP analysis provided insights into feature importance and improved interpretability. These findings confirm that lightweight ensemble classifiers are a practical and effective solution for static malware detection, offering high accuracy, fast inference, and greater transparency, which makes them suitable for deployment in environments with limited computing capacity.
Downloads
Published
Abstract
-
116 views
PDF Download
- 64 times
Issue
Section
License
Copyright (c) 2026 Nidya Sari Rahmawati, Chalvina Izumi Amalia (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.


