Experiences of landing machine learning onto market-scale mobile malware detection [conference paper]

Three computers with locks in a circle illustration

Conference

Proceedings of the Fifteenth European Conference on Computer Systems - April 15, 2020

Authors

Liangyi Gong, Zhenhua Li, Feng Qian (assistant professor), Zifan Zhang, Qi Alfred Chen, Zhiyun Qian, Hao Lin, Yunhao Liu

Abstract

App markets, being crucial and critical for today's mobile ecosystem, have also become a natural malware delivery channel since they actually "lend credibility" to malicious apps. In the past decade, machine learning (ML) techniques have been explored for automated, robust malware detection. Unfortunately, to date, we have yet to see an ML-based malware detection solution deployed at market scales. To better understand the real-world challenges, we conduct a collaborative study with a major Android app market (T-Market) offering us large-scale ground-truth data. Our study shows that the key to successfully developing such systems is manifold, including feature selection/engineering, app analysis speed, developer engagement, and model evolution. Failure in any of the above aspects would lead to the "wooden barrel effect" of the entire system. We discuss our careful design choices as well as our first-hand deployment experiences in building such an ML-powered malware detection system. We implement our design and examine its effectiveness in the T-Market for over one year, using a single commodity server to vet ~ 10K apps every day. The evaluation results show that this design achieves an overall precision of 98% and recall of 96% with an average per-app scan time of 1.3 minutes.

Link to full paper

Experiences of landing machine learning onto market-scale mobile malware detection

Keywords

mobile computing, security