Directed information (DI) is a fundamental measure for the study and analysis of sequential stochastic models. In particular, when optimized over the input distribution, it characterizes the capacity of general communication channels. However, existing optimization methods for discrete input alphabets assume full knowledge of the channel model, and are therefore not applicable when only samples are available. We derive a new method that overcomes this limitation and enables optimizing DI over unknown channels. To that end, we formulate the problem as a Markov decision process and leverage reinforcement learning techniques to optimize a deep generative model of the channel input probability mass function (PMF). Combining our optimizer with the DI neural estimator, we obtain an end-to-end estimation-optimization scheme which is applied for estimating the capacity of various discrete channels with memory. We provide empirical results that demonstrate the utility of the proposed framework and further show how to use the optimized PMF generator to obtain theoretical bounds on the feedback capacity for unifilar finite state channels.