Jonathan Crussell, Philip Kegelmeyer

Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015)

Abstract

Many security applications depend critically on clustering. However, we do not know of any clustering algorithms that were designed with an adversary in mind. An intelligent adversary may be able to use this to her advantage to subvert the security of the application. Already, adversaries use obfuscation and other techniques to alter the representation of their inputs in feature space to avoid detection. For example, malware is often packed and spam email often mimics normal email. In this work, we investigate a more active attack, in which an adversary attempts to subvert clustering analysis by feeding in carefully crafted data points.

Specifically, in this work we explore how an attacker can subvert DBSCAN, a popular density-based clustering algorithm. We explore a “confidence attack,” where an adversary seeks to poison the clusters to the point that the defender loses confidence in the utility of the system. This may result in the system being abandoned, or worse, waste the defender’s time investigating false alarms. While our attacks generalize to all DBSCAN-based tools, we focus our evaluation on AnDarwin, a tool designed to detect plagiarized Android apps. We show that an adversary can merge arbitrary clusters by connecting them with “bridges”, that even a small number of merges can greatly degrade clustering performance, and that the defender has limited recourse when relying solely on DBSCAN. Finally, we propose a remediation process that uses machine learning and features based on outlier measures that are orthogonal to the underlying clustering problem to detect and remove injected points.

Citation

@inproceedings{crussell2015attacking,
  title={Attacking dbscan for fun and profit},
  author={Crussell, Jonathan and Kegelmeyer, Philip},
  booktitle={Proceedings of the 2015 SIAM International Conference on Data Mining},
  pages={235--243},
  year={2015},
  organization={Society for Industrial and Applied Mathematics}
}

Links: