Source author record

Aritra Dutta

Aritra Dutta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Distributed, Parallel, and Cluster Computing math.NA Numerical Analysis Artificial Intelligence Computational Complexity Computer Vision Networking and Internet Architecture

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices

Accurate air quality prediction is essential for public health, environmental monitoring, and industrial safety. However, most existing approaches rely on centralized learning paradigms, which introduce challenges related to scalability, privacy preservation, and communication overhead in distributed Internet of Things (IoT) environments. Moreover, current federated learning (FL) based solutions predominantly utilize unimodal data, limiting their capability to capture complex environmental patterns. To address these limitations, we propose M$^2$FedAQI, a lightweight multimodal federated framework for decentralized Air Quality Index (AQI) prediction across heterogeneous edge devices. The proposed framework integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead. M$^2$FedAQI is evaluated on two benchmark datasets, PM25Vision and TRAQID, for both classification and regression tasks under centralized and federated settings. Experimental results demonstrate that M$^2$FedAQI consistently outperforms existing approaches, achieving improvements of up to 11.0\% in Accuracy, 3.53\% in AUC, 12.2\% in F1-score, and 18.0\% in $R^2$, while reducing MAE and RMSE by up to 25.4\% and 20.4\%, respectively, compared with the strongest baselines. Furthermore, deployment on heterogeneous edge devices demonstrates efficient resource utilization in terms of communication overhead, memory footprint, and computational cost. To enhance communication security, TLS-based authentication is incorporated to ensure secure client participation and protect the FL communication channel from unauthorized third-party access without modifying the underlying FL protocol.

preprint2022arXiv

An Adaptive Rank Continuation Algorithm for General Weighted Low-rank Recovery

This paper is devoted to proposing a general weighted low-rank recovery model and designing a fast SVD-free computational scheme to solve it. First, our generic weighted low-rank recovery model unifies several existing approaches in the literature.~Moreover, our model readily extends to the non-convex setting. Algorithm-wise, most first-order proximal algorithms in the literature for low-rank recoveries require computing singular value decomposition (SVD). As SVD does not scale appropriately with the dimension of the matrices, these algorithms become slower when the problem size becomes larger. By incorporating the variational formulation of the nuclear norm into the sub-problem of proximal gradient descent, we avoid computing SVD, which results in significant speed-up. Moreover, our algorithm preserves the rank identification property of nuclear norm [33] which further allows us to design a rank continuation scheme that asymptotically achieves the minimal iteration complexity. Numerical experiments on both toy examples and real-world problems, including structure from motion (SfM) and photometric stereo, background estimation, and matrix completion, demonstrate the superiority of our proposed algorithm.

preprint2022arXiv

Personalized Federated Learning with Communication Compression

In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data across the devices. Recently, Hanzely and Richtárik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only. They derived a new algorithm, called Loopless Gradient Descent (L2GD), to solve it and showed that this algorithms leads to improved communication complexity guarantees in regimes when more personalization is required. In this paper, we equip their L2GD algorithm with a bidirectional compression mechanism to further reduce the communication bottleneck between the local devices and the server. Unlike other compression-based algorithms used in the FL-setting, our compressed L2GD algorithm operates on a probabilistic communication protocol, where communication does not happen on a fixed schedule. Moreover, our compressed L2GD algorithm maintains a similar convergence rate as vanilla SGD without compression. To empirically validate the efficiency of our algorithm, we perform diverse numerical experiments on both convex and non-convex problems and using various compression techniques.

preprint2021arXiv

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

Sparse tensors appear frequently in distributed deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored for distributed deep learning. DeepReduce decomposes sparse tensors in two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on Tensorflow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods, without affecting the training accuracy.

preprint2020arXiv

A Nonconvex Projection Method for Robust PCA

Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components. In this paper, we propose a nonconvex feasibility reformulation of RPCA problem and apply an alternating projection method to solve it. To the best of our knowledge, we are the first to propose a method that solves RPCA problem without considering any objective function, convex relaxation, or surrogate convex constraints. We demonstrate through extensive numerical experiments on a variety of applications, including shadow removal, background estimation, face detection, and galaxy evolution, that our approach matches and often significantly outperforms current state-of-the-art in various ways.

preprint2020arXiv

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their convergence, the convergence analyses of their asynchronous counterparts are not widely studied. In this paper, we propose and analyze a {\it distributed, asynchronous parallel} SGD in light of solving an arbitrary consistent linear system by reformulating the system into a stochastic optimization problem as studied by Richtárik and Takác in [35]. We compare the convergence rates of our asynchronous SGD algorithm with the synchronous parallel algorithm proposed by Richtárik and Takáč in [35] under different choices of the hyperparameters---the stepsize, the damping factor, the number of processors, and the delay factor. We show that our asynchronous parallel SGD algorithm also enjoys a global linear convergence rate, similar to the {\em basic} method and the synchronous parallel method in [35] for solving any arbitrary consistent linear system via stochastic reformulation. We also show that our asynchronous parallel SGD improves upon the {\em basic} method with a better convergence rate when the number of processors is larger than four. We further show that this asynchronous approach performs asymptotically better than its synchronous counterpart for certain linear systems. Moreover, for certain linear systems, we compute the minimum number of processors required for which our asynchronous parallel SGD is better, and find that this number can be as low as two for some ill-conditioned problems.

Aritra Dutta

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices

An Adaptive Rank Continuation Algorithm for General Weighted Low-rank Recovery

Personalized Federated Learning with Communication Compression

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

A Nonconvex Projection Method for Robust PCA

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems