Solving continual combinatorial selection via deep reinforcement learning

Hyungseok Song, Hyeryung Jang, Hai H. Tran, Se Eun Yoon, Kyunghwan Son, Donggyu Yun, Hyoju Chung, Yung Yi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.

Original languageEnglish
Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
EditorsSarit Kraus
PublisherInternational Joint Conferences on Artificial Intelligence
Pages3467-3474
Number of pages8
ISBN (Electronic)9780999241141
DOIs
StatePublished - 2019
Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
Duration: 10 Aug 201916 Aug 2019

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2019-August
ISSN (Print)1045-0823

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Country/TerritoryChina
CityMacao
Period10/08/1916/08/19

Fingerprint

Dive into the research topics of 'Solving continual combinatorial selection via deep reinforcement learning'. Together they form a unique fingerprint.

Cite this