TY - GEN
T1 - Adiabatic Persistent Contrastive Divergence learning
AU - Jang, Hyeryung
AU - Choi, Hyungwon
AU - Yi, Yung
AU - Shin, Jinwoo
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/9
Y1 - 2017/8/9
N2 - This paper studies the problem of parameter learning in graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. To tackle the issue, the Contrastive Divergence (CD) learning scheme has been popularly used in the deep learning community, where it runs the mean-field approximation in E step and a few cycles of Markov Chains (MC) in M step. In this paper, we analyze a variant of CD, called Adiabatic Persistent Contrastive Divergence (APCD), which runs a few cycles of MCs in both E and M steps. Using multi-time-scale stochastic approximation theory, we prove that APCD converges to a correct optimum, where the standard CD is impossible to have such a guarantee due to the mean-field approximation gap in E step. Despite of such stronger theoretical guarantee of APCD, its possible drawback is on slow mixing on E step for practical purposes. To address the issue, we also design a hybrid approach applying both mean-field and MC approximations in E step, where it outperforms the standard mean-field-based CD in our experiments on real-world datasets.
AB - This paper studies the problem of parameter learning in graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. To tackle the issue, the Contrastive Divergence (CD) learning scheme has been popularly used in the deep learning community, where it runs the mean-field approximation in E step and a few cycles of Markov Chains (MC) in M step. In this paper, we analyze a variant of CD, called Adiabatic Persistent Contrastive Divergence (APCD), which runs a few cycles of MCs in both E and M steps. Using multi-time-scale stochastic approximation theory, we prove that APCD converges to a correct optimum, where the standard CD is impossible to have such a guarantee due to the mean-field approximation gap in E step. Despite of such stronger theoretical guarantee of APCD, its possible drawback is on slow mixing on E step for practical purposes. To address the issue, we also design a hybrid approach applying both mean-field and MC approximations in E step, where it outperforms the standard mean-field-based CD in our experiments on real-world datasets.
UR - http://www.scopus.com/inward/record.url?scp=85034050147&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2017.8007081
DO - 10.1109/ISIT.2017.8007081
M3 - Conference contribution
AN - SCOPUS:85034050147
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 3005
EP - 3009
BT - 2017 IEEE International Symposium on Information Theory, ISIT 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Symposium on Information Theory, ISIT 2017
Y2 - 25 June 2017 through 30 June 2017
ER -