INSTITUTIONAL DIGITAL REPOSITORY

MFC: A Multishot Approach to Federated Data Clustering

Show simple item record

dc.contributor.author Makkar, J
dc.contributor.author Bhumika
dc.contributor.author Jain, Shweta
dc.contributor.author Gupta, S
dc.date.accessioned 2024-05-29T14:00:15Z
dc.date.available 2024-05-29T14:00:15Z
dc.date.issued 2024-05-29
dc.identifier.uri http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4565
dc.description.abstract Abstract The work explores the federated data clustering problem. The primary goal is to perform k-means clustering of data distributed over multiple clients while preserving privacy during an exchange with the central server. Existing solutions to unsupervised federated data clustering are either computationally challenging or effective only in heterogeneous regimes, i.e., when the number of clusters per client (kz) is less than the total number of clusters (k) (specifically, kz ≤ unmapped: inline-formula unmapped: math unmapped: msqrt unmapped: mrow unmapped: mi k). Moreover, existing one-shot approaches assume that the information about kz is available for each client. In this paper, we propose two multi-shot approaches which we call MFC and MFCH, that perform well on both heterogeneous and non-heterogeneous regimes, i.e., are independent of the underlying client data distribution. Both MFC and MFCH stand out as they do not rely on prior knowledge about kz. We theoretically bound the closeness of the local centers obtained by MFC to that of the optimal global centers and prove that under some well-separability assumption, the centers will be close enough. MFCH improvises MFC by only sharing a single cluster center from each client, thus ensuring more privacy. Our theoretical analysis shows that when at least O(k2log k) clients are involved, centers obtained by MFCH will closely approximate optimal global centers. Experiments on synthetic and real-world datasets validate the proposed approaches’ efficacy showcasing lower objective costs in non-heterogeneous regimes while having comparable performance in heterogeneous regimes. In addition, as a byproduct MFC exhibits higher device-level fairness in terms of the individual objective cost compared to existing state-of-the-art algorithms. The code is publicly available at https://github.com/shivi98g/MFC. en_US
dc.language.iso en_US en_US
dc.title MFC: A Multishot Approach to Federated Data Clustering en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account