TY - JOUR
T1 - On Cost-Efficient Learning of Data Dependency
AU - Jang, Hyeryung
AU - Song, Hyungseok
AU - Yi, Yung
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - In this paper, we consider the problem of learning a tree graph structure that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform a probabilistic inference task. Inference in the data graph includes marginal inference and maximum a posteriori (MAP) estimation, and belief propagation (BP) is a commonly used algorithm to compute the marginal distribution of nodes via message-passing, incurring non-negligible amount of communication cost. We inevitably have the trade-off between the inference accuracy and the message-passing cost because the learned structure of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing costs. We focus on two popular implementations of BP, ASYNC-BP and SYNC-BP, which have different message-passing mechanisms and cost structures. In ASYNC-BP, we propose a polynomial-time learning algorithm that is optimal, motivated by finding a maximum weight spanning tree of a complete graph. In SYNC-BP, we prove the NP-hardness of the problem and propose a greedy heuristic. For both BP implementations, we quantify how the error probability that the learned cost-efficient data graph differs from the ideal one decays as the number of data samples grows, using the large deviation principle, which provides a guideline on how many samples are necessary to obtain a certain trade-off. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.
AB - In this paper, we consider the problem of learning a tree graph structure that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform a probabilistic inference task. Inference in the data graph includes marginal inference and maximum a posteriori (MAP) estimation, and belief propagation (BP) is a commonly used algorithm to compute the marginal distribution of nodes via message-passing, incurring non-negligible amount of communication cost. We inevitably have the trade-off between the inference accuracy and the message-passing cost because the learned structure of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing costs. We focus on two popular implementations of BP, ASYNC-BP and SYNC-BP, which have different message-passing mechanisms and cost structures. In ASYNC-BP, we propose a polynomial-time learning algorithm that is optimal, motivated by finding a maximum weight spanning tree of a complete graph. In SYNC-BP, we prove the NP-hardness of the problem and propose a greedy heuristic. For both BP implementations, we quantify how the error probability that the learned cost-efficient data graph differs from the ideal one decays as the number of data samples grows, using the large deviation principle, which provides a guideline on how many samples are necessary to obtain a certain trade-off. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.
KW - belief propagation
KW - distributed inference
KW - Graph structure learning
KW - large deviation principle
KW - sample complexity
UR - http://www.scopus.com/inward/record.url?scp=85123358934&partnerID=8YFLogxK
U2 - 10.1109/TNET.2022.3141128
DO - 10.1109/TNET.2022.3141128
M3 - Article
AN - SCOPUS:85123358934
SN - 1063-6692
VL - 30
SP - 1382
EP - 1394
JO - IEEE/ACM Transactions on Networking
JF - IEEE/ACM Transactions on Networking
IS - 3
ER -