Abstract
A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.
Original language | English |
---|---|
Pages (from-to) | 301-309 |
Number of pages | 9 |
Journal | Journal of Applied Probability |
Volume | 19 |
Issue number | 2 |
DOIs | |
State | Published - 1 Jan 1982 |
ASJC Scopus subject areas
- Statistics and Probability
- Mathematics (all)
- Statistics, Probability and Uncertainty