Abstract
We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an O(ln1+ɛ (N)) regret for any specified ɛ >0 in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.
Original language | English |
---|---|
Pages (from-to) | 69-107 |
Number of pages | 39 |
Journal | Stochastic Systems |
Volume | 14 |
Issue number | 1 |
DOIs | |
State | Published - 1 Mar 2024 |
Externally published | Yes |
Keywords
- queueing systems with uncertainty
- reinforcement learning
ASJC Scopus subject areas
- Statistics and Probability
- Modeling and Simulation
- Statistics, Probability and Uncertainty
- Management Science and Operations Research