TY - GEN
T1 - A reward field model generation in Q-learning by dynamic programming
AU - Sung, Yunsick
AU - Cho, Kyungeun
AU - Um, Kyhyun
PY - 2009
Y1 - 2009
N2 - Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.
AB - Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.
KW - Learning time
KW - Reinforcement learning
KW - Reward model
UR - http://www.scopus.com/inward/record.url?scp=74949119639&partnerID=8YFLogxK
U2 - 10.1145/1655925.1656047
DO - 10.1145/1655925.1656047
M3 - Conference contribution
AN - SCOPUS:74949119639
SN - 9781605587103
T3 - ACM International Conference Proceeding Series
SP - 674
EP - 679
BT - Proceedings of 2nd International Conference on Interaction Sciences
T2 - 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
Y2 - 24 November 2009 through 26 November 2009
ER -