A Feature Engineering Approach for Classification and Detection of Polymorphic Malware using Machine Learning

Abstract: 
Malicious software have evolved as a major threat in Computer Systems. These threats aim at compromising systems from low level to high level, from individuals to large corporations or even government institutions. Their creation technology is constantly evolving by using sophisticated tactics to create multiple instances of the existing ones, by encrypting the malicious payload as well as changing the code structure at each infection, while retaining the same functionality. Due to polymorphism in today’s malware, current solutions are not yet able to sufficiently address this problem. Such solutions are mostly signature-based and a changing malware means a changing signature. Generally, the detection rate should be 100% in an ideal setup. However, there is a high rate of False Positives and False Negatives. The frequent updating of signature-based systems requires more resources. Previous research shows that current malware detection rate by anti-malware products is between 25% and 50%. They easily evade detection and classification in their respective families is also hard, making it more difficult to eliminate them. This study addresses this problem by providing a structured deeper analysis of key features or parameters that enable polymorphism in malware. This study proposed a Novel Feature Engineering (NFE) approach for a better classification and detection of polymorphic malware based on structural and behavioural features. Key features were highly engineered to help in building powerful classification and detection models. This research employs NFE approach to create an improved malware classification model. Three classifiers namely KNN, Linear Discriminant Analysis and Gradient Boosting Machine (GBM) were used for this task. The best model achieved an accuracy of 94% on malware classification. This research also developed an early-stage Deep learning based detection model. The experiments achieved an earlier detection accuracy of 99.66% within the first five seconds of file execution. The solutions provided by this research will revolutionize anti-malware industry in creating better protection mechanisms.
Language: 
Date of publication: 
2019
Country: 
Region Focus: 
East Africa
Author/Editor(s): 
Collection: 
RUFORUM Theses and Dissertations
Agris Subject Categories: 
Access restriction: 
Supervisor: 
Dr. Swaib Kyanda Kaawaase; Dr. Julianne Sansa‑Otim
Form: 
Web resource
ISSN: 
E_ISSN: 
Edition: 
Extent: 
xvii, 161