TY - JOUR
T1 - Fair Clustering with Fair Correspondence Distribution
AU - Lee, Woojin
AU - Ko, Hyungjin
AU - Byun, Junyoung
AU - Yoon, Taeho
AU - Lee, Jaewook
N1 - Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/12
Y1 - 2021/12
N2 - In recent years, the issue of fairness has become important in the field of machine learning. In clustering problems, fairness is defined in terms of consistency in that the balance ratio of data with different sensitive attribute values remains constant for each cluster. Fairness problems are important in real-world applications, for example, when the recommendation system provides targeted advertisements or job offers based on the clustering result of candidates, the minority group may not get the same level of opportunity as the majority group if the clustering result is unfair. In this study, we propose a novel distribution-based fair clustering approach. Considering a distribution in which the sample is biased by society, we try to find clusters from a fair correspondence distribution. Our method uses the support vector method and a dynamical system to comprehensively divide the entire data space into atomic cells before reassembling them fairly to form the clusters. Theoretical results derive the upper bound of the generalization error of the corresponding clustering function in the fair correspondence distribution when atomic cells are connected fairly, allowing us to present an algorithm to achieve fairness. Experimental results show that our algorithm beneficially increases fairness while reducing computation time for various datasets.
AB - In recent years, the issue of fairness has become important in the field of machine learning. In clustering problems, fairness is defined in terms of consistency in that the balance ratio of data with different sensitive attribute values remains constant for each cluster. Fairness problems are important in real-world applications, for example, when the recommendation system provides targeted advertisements or job offers based on the clustering result of candidates, the minority group may not get the same level of opportunity as the majority group if the clustering result is unfair. In this study, we propose a novel distribution-based fair clustering approach. Considering a distribution in which the sample is biased by society, we try to find clusters from a fair correspondence distribution. Our method uses the support vector method and a dynamical system to comprehensively divide the entire data space into atomic cells before reassembling them fairly to form the clusters. Theoretical results derive the upper bound of the generalization error of the corresponding clustering function in the fair correspondence distribution when atomic cells are connected fairly, allowing us to present an algorithm to achieve fairness. Experimental results show that our algorithm beneficially increases fairness while reducing computation time for various datasets.
KW - Fair clustering
KW - Fair distribution
KW - Support vector clustering
UR - http://www.scopus.com/inward/record.url?scp=85115424874&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.09.010
DO - 10.1016/j.ins.2021.09.010
M3 - Article
AN - SCOPUS:85115424874
SN - 0020-0255
VL - 581
SP - 155
EP - 178
JO - Information Sciences
JF - Information Sciences
ER -