Malconv Tuning: Malware Detection using more than Just Raw Bytes

Main Article Content

AS Mallesh
Gunamani Jena
Mupparthi Bala Veera Venkata Satya Sai
Yarramsetti Sajeev Kumar
Vasarla Durga Sandeep
Puramsetti Satish
Tarun Kudupud

Abstract

Malware is a major security threat in today's digital world. Traditional security solutions are
unable to keep up with emerging malware. Machine learning is currently bearing fruit in a
variety of fields, and its application in security is gaining traction. Many different features,
such as opcodes and byte entropy, can be used as input features. Without any domain
knowledge, raw bytes of binaries can also be used as machine learning inputs. However, the
raw bytes input size is limited. Furthermore, when binary sizes differ significantly, pure raw
bytes may lack the necessary information to make sound decisions. In this paper, we
implement Tuning Malconv, a detection model that uses richer features to detect malware.
Tuning Malconv is made up of two layers, each of which is an independent model. The first
layer's input features are raw bytes. If the first layer is unable to make decisions. The second
layer extracts from binaries n-grams of byte codes, PE imports, string patterns in binaries, and
PE section names. The second layer then uses these features as inputs to make final decisions.
To evaluate our model, we use two datasets, one small and one large. The results of the
experiments show that Tuning Malconv can achieve robust performance. It takes
approximately 1100 seconds to detect 8213 softwares, which is reasonable. Tuning Malconv's
overall accuracy can reach 99.03 percent on our small dataset and 98.69 percent on our large
dataset. Tuning Malconv can thus perform efficient and effective malware detection with
features that go beyond raw bytes.

Article Details

Section
Articles