TY - JOUR
T1 - Enhanced dqn framework for selecting actions and updating replay memory considering massive non-executable actions
AU - Gu, Bonwoo
AU - Sung, Yunsick
N1 - Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - A Deep-Q-Network (DQN) controls a virtual agent as the level of a player using only screenshots as inputs. Replay memory selects a limited number of experience replays according to an arbitrary batch size and updates them using the associated Q-function. Hence, relatively fewer experience replays of different states are utilized when the number of states is fixed and the state of the randomly selected transitions becomes identical or similar. The DQN may not be applicable in some environments where it is necessary to perform the learning process using more experience replays than is required by the limited batch size. In addition, because it is unknown whether each action can be executed, a problem of an increasing amount of repetitive learning occurs as more non-executable actions are selected. In this study, an enhanced DQN framework is proposed to resolve the batch size problem and reduce the learning time of a DQN in an environment with numerous non-executable actions. In the proposed framework, non-executable actions are filtered to reduce the number of selectable actions to identify the optimal action for the current state. The proposed method was validated in Gomoku, a strategy board game, in which the application of a traditional DQN would be difficult.
AB - A Deep-Q-Network (DQN) controls a virtual agent as the level of a player using only screenshots as inputs. Replay memory selects a limited number of experience replays according to an arbitrary batch size and updates them using the associated Q-function. Hence, relatively fewer experience replays of different states are utilized when the number of states is fixed and the state of the randomly selected transitions becomes identical or similar. The DQN may not be applicable in some environments where it is necessary to perform the learning process using more experience replays than is required by the limited batch size. In addition, because it is unknown whether each action can be executed, a problem of an increasing amount of repetitive learning occurs as more non-executable actions are selected. In this study, an enhanced DQN framework is proposed to resolve the batch size problem and reduce the learning time of a DQN in an environment with numerous non-executable actions. In the proposed framework, non-executable actions are filtered to reduce the number of selectable actions to identify the optimal action for the current state. The proposed method was validated in Gomoku, a strategy board game, in which the application of a traditional DQN would be difficult.
KW - Deep-Q-Network
KW - Game artificial intelligence
KW - Gomoku
KW - Reinforcement learning
KW - Replay memory
UR - http://www.scopus.com/inward/record.url?scp=85119997214&partnerID=8YFLogxK
U2 - 10.3390/app112311162
DO - 10.3390/app112311162
M3 - Article
AN - SCOPUS:85119997214
SN - 2076-3417
VL - 11
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 23
M1 - 11162
ER -