Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

Ziv Aharoni, Gal Rattner, Haim Permuter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Deep Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, deep RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI) we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise gradient clipping. In total, we have found that applying our methods combined with previously introduced regularization and optimization methods resulted in improvement to the state-of-the-art architectures operating in language modeling tasks.

Original languageEnglish
Title of host publicationCyber Security Cryptography and Machine Learning - Second International Symposium, CSCML 2018, Proceedings
EditorsItai Dinur, Shlomi Dolev, Sachin Lodha
PublisherSpringer Verlag
Pages274-277
Number of pages4
ISBN (Print)9783319941462
DOIs
StatePublished - 17 Jun 2018
Event2nd International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2018 - Beer-Sheva, Israel
Duration: 21 Jun 201822 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10879 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2018
Country/TerritoryIsrael
CityBeer-Sheva
Period21/06/1822/06/18

Keywords

  • Data-processing-inequality
  • Machine-learning
  • Recurrent-neural-networks
  • Regularization
  • Training-methods

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Brief Announcement: Gradual Learning of Deep Recurrent Neural Network'. Together they form a unique fingerprint.

Cite this