Abstract
In this paper, we propose a method of learning representation layers with squashing activation functions within a deep artificial neural network which directly addresses the vanishing gradients problem. The proposed solution is derived from solving the maximum likelihood estimator for components of the posterior representation, which are approximately Beta-distributed, formulated in the context of variational inference. This approach not only improves the performance of deep neural networks with squashing activation functions on some of the hidden layers - including in discriminative learning - but can be employed towards producing sparse codes.
| Original language | English |
|---|---|
| Pages (from-to) | 2456-2470 |
| Number of pages | 15 |
| Journal | Applied Intelligence |
| Volume | 51 |
| Issue number | 4 |
| DOIs | |
| State | Published - 1 Apr 2021 |
| Externally published | Yes |
Keywords
- Beta distribution
- Infomax
- Learning representations
- Vanishing gradients
ASJC Scopus subject areas
- Artificial Intelligence