Helix: DGA Domain Embeddings for Tracking and Exploring Botnets

Lior Sidi, Yisroel Mirsky, Asaf Nadler, Yuval Elovici, Asaf Shabtai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Botnets have been using domain generation algorithms (DGA) for over a decade to covertly and robustly identify the domain name of their command and control servers (C&C). Recent advancements in DGA detection has motivated botnet owners to rapidly alter the C&C domain and use adversarial techniques to evade detection. As a result, it has become increasingly difficult to track botnets in DNS traffic. In this paper, we present Helix, a method for tracking and exploring botnets. Helix uses a spatio-temporal deep neural network autoencoder to convert domains into numerical vectors (embeddings) which capture the DGA and seed used to create the domain. This is made possible by leveraging both convolutional (spatial) and recurrent (temporal) layers, and by using techniques such as attention mechanisms and highways. Furthermore, by using an autoencoder architecture, the network can be trained in an unsupervised manner (no labeling of data) which makes the system practical for real world deployments. In our evaluation, we found that Helix can track botnet campaigns, distinguish between DGA families and seeds, and can identify domains generated using the latest adversarial machine learning techniques. Helix is currently being used to track botnets in one of the world's largest Internet Service Providers (ISP), and we include some of the ISP's analysis work using our method.

Original languageEnglish
Title of host publicationCIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages2741-2748
Number of pages8
ISBN (Electronic)9781450368599
DOIs
StatePublished - 19 Oct 2020
Event29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - Virtual, Online, Ireland
Duration: 19 Oct 202023 Oct 2020

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Country/TerritoryIreland
CityVirtual, Online
Period19/10/2023/10/20

Keywords

  • autoencoder
  • botnet
  • cnn
  • dga
  • dns
  • embedding
  • lstm

Fingerprint

Dive into the research topics of 'Helix: DGA Domain Embeddings for Tracking and Exploring Botnets'. Together they form a unique fingerprint.

Cite this