A reward field model generation in Q-learning by dynamic programming

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.

Original languageEnglish
Title of host publicationProceedings of 2nd International Conference on Interaction Sciences
Subtitle of host publicationInformation Technology, Culture and Human
Pages674-679
Number of pages6
DOIs
StatePublished - 2009
Event2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009 - Seoul, Korea, Republic of
Duration: 24 Nov 200926 Nov 2009

Publication series

NameACM International Conference Proceeding Series
Volume403

Conference

Conference2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
Country/TerritoryKorea, Republic of
CitySeoul
Period24/11/0926/11/09

Keywords

  • Learning time
  • Reinforcement learning
  • Reward model

Fingerprint

Dive into the research topics of 'A reward field model generation in Q-learning by dynamic programming'. Together they form a unique fingerprint.

Cite this