Abstract
This study presents comprehensive experimental results for the IEEE BigData 2022 Cup, using a generative adversarial network (GAN) to generate appropriate malign samples to improve a malware classifier’s performance. For the experiments, we employed conditional tabular GAN (CTGAN), conditional table GAN (CTAB-GAN), and complementary GAN architectures to deal with the data imbalance problem commonly encountered in classification tasks. The results showed that CTAB-GAN outperformed the other GANs in producing synthetic data that are statistically comparable to the given training data. This shows that the classifier’s performance improved on the validation dataset, and suggests that better classification performance can be achieved in terms of machine learning efficacy using better quality synthetic data. Although CTAB-GAN performed better than CTGAN and Complementary GAN in terms of statistical similarity and machine learning efficacy, it could overfit on the training data. Therefore, we used both CTGAN and CTAB-GAN to produce a balanced dataset to train the classifier for the final solution. The root mean square error of the classifier was 0.103, which is an improvement of 0.066 from the baseline performance of 0.169.
카카오뱅크 금융기술연구소
Financial Tech Lab