Fair Clustering with Fair Correspondence Distribution

Woojin Lee, Hyungjin Ko, Junyoung Byun, Taeho Yoon, Jaewook Lee

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

In recent years, the issue of fairness has become important in the field of machine learning. In clustering problems, fairness is defined in terms of consistency in that the balance ratio of data with different sensitive attribute values remains constant for each cluster. Fairness problems are important in real-world applications, for example, when the recommendation system provides targeted advertisements or job offers based on the clustering result of candidates, the minority group may not get the same level of opportunity as the majority group if the clustering result is unfair. In this study, we propose a novel distribution-based fair clustering approach. Considering a distribution in which the sample is biased by society, we try to find clusters from a fair correspondence distribution. Our method uses the support vector method and a dynamical system to comprehensively divide the entire data space into atomic cells before reassembling them fairly to form the clusters. Theoretical results derive the upper bound of the generalization error of the corresponding clustering function in the fair correspondence distribution when atomic cells are connected fairly, allowing us to present an algorithm to achieve fairness. Experimental results show that our algorithm beneficially increases fairness while reducing computation time for various datasets.

Original languageEnglish
Pages (from-to)155-178
Number of pages24
JournalInformation Sciences
Volume581
DOIs
StatePublished - Dec 2021

Keywords

  • Fair clustering
  • Fair distribution
  • Support vector clustering

Fingerprint

Dive into the research topics of 'Fair Clustering with Fair Correspondence Distribution'. Together they form a unique fingerprint.

Cite this