Deep dive into authorship verification of email messages with convolutional neural network

Marina Litvak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Authorship verification is the task of determining whether a specific individual did or did not write a text, which very naturally can be reduced to the binary-classification problem. This paper deals with the authorship verification of short email messages. Hereafter, we use “message” to identify the content of the information that is transmitted by email. The proposed method implements the binary classification with a sequence-to-sequence (seq2seq) model and trains a convolutional neural network (CNN) on positive (written by the “target” user) and negative (written by “someone else”) examples. The proposed method differs from previously published works, which represent text by numerous stylometric features, by requiring neither advanced text preprocessing nor explicit feature extraction. All messages are submitted to the CNN “as is,” after padding to the maximal length and replacing all words by their ID numbers. CNN learns the most appropriate features with backpropagation and then performs classification. The experiments performed on the Enron dataset using the TensorFlow framework show that the CNN classifier verifies message authorship very accurately.

Original languageEnglish
Title of host publicationInformation Management and Big Data - 5th International Conference, SIMBig 2018, Proceedings
EditorsJuan Antonio Lossio-Ventura, Denisse Muñante, Hugo Alatrista-Salas
PublisherSpringer Verlag
Pages129-136
Number of pages8
ISBN (Print)9783030116798
DOIs
StatePublished - 1 Jan 2019
Externally publishedYes
Event5th International Conference on Information Management and Big Data, SIMBig 2018 - Lima, Peru
Duration: 3 Sep 20185 Sep 2018

Publication series

NameCommunications in Computer and Information Science
Volume898
ISSN (Print)1865-0929

Conference

Conference5th International Conference on Information Management and Big Data, SIMBig 2018
Country/TerritoryPeru
CityLima
Period3/09/185/09/18

Keywords

  • Authorship verification
  • Binary classification
  • Convolutional neural network

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Deep dive into authorship verification of email messages with convolutional neural network'. Together they form a unique fingerprint.

Cite this