View article

[PDF] from springer.com

Near-optimal reinforcement learning in polynomial time

Authors

Michael Kearns, Satinder Singh

Publication date

2002/11

Journal

Machine learning

Volume

Pages

209-232

Publisher

Kluwer Academic Publishers

Description

We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states and actions, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the Exploration-Exploitation trade-off.

Total citations

Cited by 1318

2001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320245 13 15 18 25 29 44 42 50 46 58 60 45 50 39 43 51 72 91 122 110 116 106 37

Scholar articles

Near-optimal reinforcement learning in polynomial time

M Kearns, S Singh - Machine learning, 2002

Cited by 1318 Related articles All 31 versions