Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Joint Authors
Zhu, Quanxin
Huang, Chuangxia
Yang, Xinsong
Source
Issue
Vol. 2009, Issue 2009 (31 Dec. 2009), pp.1-17, 17 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2010-01-27
Country of Publication
Egypt
No. of Pages
17
Main Subjects
Abstract EN
We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces.
The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds.
The criterion that we are concerned with is expected average reward.
We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA.
Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.
American Psychological Association (APA)
Zhu, Quanxin& Yang, Xinsong& Huang, Chuangxia. 2010. Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces. Abstract and Applied Analysis،Vol. 2009, no. 2009, pp.1-17.
https://search.emarefa.net/detail/BIM-446602
Modern Language Association (MLA)
Zhu, Quanxin…[et al.]. Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces. Abstract and Applied Analysis No. 2009 (2009), pp.1-17.
https://search.emarefa.net/detail/BIM-446602
American Medical Association (AMA)
Zhu, Quanxin& Yang, Xinsong& Huang, Chuangxia. Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces. Abstract and Applied Analysis. 2010. Vol. 2009, no. 2009, pp.1-17.
https://search.emarefa.net/detail/BIM-446602
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-446602