Jonathan Crussell
PhD Dissertation, University of California, Davis (2014)
Abstract
Smart phones are rapidly becoming a fixture of modern day life. Their popularity and market penetration have given rise to a flourishing ecosystem of mobile apps that provide users with a wide range of useful functionality. Android users may download apps from Google’s official Android Market or from a number of third-party markets. To ensure a healthy mobile app environment, users should have access to high quality apps and developers should be financially compensated for their efforts. However, apps may be copied, or “cloned,” by a dishonest developer and released as her own, subverting revenue from the original developer or possibly including additional malicious functionality.
I present two approaches to detect similar Android apps based on semantic information. I implement the first approach in a tool called DNADroid which robustly computes the similarity between two apps by comparing program dependency graphs between methods in candidate apps. The second approach, implemented in a tool called AnDarwin, is capable of detecting similar apps on an unprecedented scale. In contrast to earlier approaches, AnDarwin has four advantages: it avoids comparing apps pairwise, thus greatly improving its scalability; it analyzes only the app code and does not rely on other information — such as the app’s market, signature, or description — thus greatly increasing its reliability; it can detect both full and partial app similarity; and it can automatically detect library code and remove it from the similarity analysis. I evaluate DNADroid and AnDarwin on many Android apps crawled from multiple Android markets including the official Android Market. My evaluation demonstrates these tools’ ability to accurately detect similar apps. Finally, I show how DNADroid and AnDarwin can be used in conjunction with other tools to gain insights into the app ecosystem such as the prevalence of malware families that commit ad fraud.
Citation
@phdthesis{crussell2014scalable,
title={Scalable Semantics-Based Detection of Similar Android Apps: Design, Implementation, and Applications},
author={Crussell, Jonathan},
year={2014},
school={University of California, Davis}
}