In this work, a novel data-driven methodology for designing polar codes is proposed. The methodology is suitable for the case where the channel is given as a "black-box"and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model. The methodology consists of two components: (1) a neural estimation of the sufficient statistic of the channel outputs using recent advances in Kullback Leibler (KL) estimation, and (2) a neural successive cancellation (NSC) decoder using three neural networks that replace the core elements of the successive cancellation (SC) decoder. The parameters of the neural networks are determined during a training phase where the mutual information of the effective channels is estimated. We demonstrate the performance of the algorithm on memoryless channels and on finite state channels. Then, we compare the results with the optimal decoding given by the SC and SC trellis decoders, respectively.