Scheduling in cellular networks is one of the most influential factors in performance in wireless deployments such as 4G and 5G and is one of the most challenging and influential resource allocation tasks performed by the base station. It requires the handling of two important performance metrics, throughput and fairness. Fundamentally, these two metrics challenge one another, and maximization of one might come at the expense of the other. On the one hand maximizing the throughput, which is the goal of many communication networks, requires allocating the resources to users with better channel conditions. On the other hand, fairness requires allocating some resources to users with poor channel conditions. One of the prevalent scheduling schemes relies on maximization of the proportional fairness criterion that balances between the two aforementioned metrics with minimal compromise. Proportional fairness based schedulers commonly rely on a greedy approach in which each resource block is allocated to the user that maximizes the proportional fairness criterion. However, typically users can tolerate some delay especially if it boosts their performance. Motivated by this assertion, we suggest a reinforcement-based proportional-fair scheduler for cellular networks. The suggested scheduler incorporates users’ channel estimates together with predicted future channel estimates in the process of resource allocation, in order to maximize the proportional fairness criterion in predefined periodic time epochs. We developed a reinforcement learning tool that learns the users’ channel fluctuations and decides upon the best user selection at each time slot in order to achieve the best fairness in throughput trade-off over multiple time slots. We demonstrate through simulations how such a scheduler outperforms the standardized proportional fairness. We further implemented the suggested scheme on a real live 4G base station, also known as an EnodeB, and showed similar gains.