Source author record

Marimuthu Palaniswami

Marimuthu Palaniswami appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Distributed, Parallel, and Cluster Computing Artificial Intelligence Networking and Internet Architecture Computational Engineering, Finance, and Science eess.SY Information Theory math.IT Systems and Control

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

While traditional time-series classifiers assume full sequences at inference, practical constraints (latency and cost) often limit inputs to partial prefixes. The absence of class-discriminative patterns in partial data can significantly hinder a classifier's ability to generalize. This work uses knowledge distillation (KD) to equip partial time series classifiers with the generalization ability of their full-sequence counterparts. In KD, high-capacity teacher transfers supervision to aid student learning on the target task. Matching with teacher features has shown promise in closing the generalization gap due to limited parameter capacity. However, when the generalization gap arises from training-data differences (full versus partial), the teacher's full-context features can be an overwhelming target signal for the student's short-context features. To provide progressive, diverse, and collective teacher supervision, we propose Generative Diffusion Prior Distillation (GDPD), a novel KD framework that treats short-context student features as degraded observations of the target full-context features. Inspired by the iterative restoration capability of diffusion models, we learn a diffusion-based generative prior over teacher features. Leveraging this prior, we posterior-sample target teacher representations that could best explain the missing long-range information in the student features and optimize the student features to be minimally degraded relative to these targets. GDPD provides each student feature with a distribution of task-relevant long-context knowledge, which benefits learning on the partial classification task. Extensive experiments across earliness settings, datasets, and architectures demonstrate GDPD's effectiveness for full-to-partial distillation.

preprint2026arXiv

Learning to Reason: Temporal Saliency Distillation for Interpretable Knowledge Transfer

Knowledge distillation has proven effective for model compression by transferring knowledge from a larger network called the teacher to a smaller network called the student. Current knowledge distillation in time series is predominantly based on logit and feature aligning techniques originally developed for computer vision tasks. These methods do not explicitly account for temporal data and fall short in two key aspects. First, the mechanisms by which the transferred knowledge helps the student model learning process remain unclear due to uninterpretability of logits and features. Second, these methods transfer only limited knowledge, primarily replicating the teacher predictive accuracy. As a result, student models often produce predictive distributions that differ significantly from those of their teachers, hindering their safe substitution for teacher models. In this work, we propose transferring interpretable knowledge by extending conventional logit transfer to convey not just the right prediction but also the right reasoning of the teacher. Specifically, we induce other useful knowledge from the teacher logits termed temporal saliency which captures the importance of each input timestep to the teacher prediction. By training the student with Temporal Saliency Distillation we encourage it to make predictions based on the same input features as the teacher. Temporal Saliency Distillation requires no additional parameters or architecture specific assumptions. We demonstrate that Temporal Saliency Distillation effectively improves the performance of baseline methods while also achieving desirable properties beyond predictive accuracy. We hope our work establishes a new paradigm for interpretable knowledge distillation in time series analysis.

preprint2026arXiv

MemKD: Memory-Discrepancy Knowledge Distillation for Efficient Time Series Classification

Deep learning models, particularly recurrent neural networks and their variants, such as long short-term memory, have significantly advanced time series data analysis. These models capture complex, sequential patterns in time series, enabling real-time assessments. However, their high computational complexity and large model sizes pose challenges for deployment in resource-constrained environments, such as wearable devices and edge computing platforms. Knowledge Distillation (KD) offers a solution by transferring knowledge from a large, complex model (teacher) to a smaller, more efficient model (student), thereby retaining high performance while reducing computational demands. Current KD methods, originally designed for computer vision tasks, neglect the unique temporal dependencies and memory retention characteristics of time series models. To this end, we propose a novel KD framework termed Memory-Discrepancy Knowledge Distillation (MemKD). MemKD leverages a specialized loss function to capture memory retention discrepancies between the teacher and student models across subsequences within time series data, ensuring that the student model effectively mimics the teacher model's behaviour. This approach facilitates the development of compact, high-performing recurrent neural networks suitable for real-time, time series analysis tasks. Our extensive experiments demonstrate that MemKD significantly outperforms state-of-the-art KD methods. It reduces parameter size and memory usage by approximately 500 times while maintaining comparable performance to the teacher model.

preprint2022arXiv

Achieving AI-enabled Robust End-to-End Quality of Experience over Radio Access Networks

Emerging applications such as Augmented Reality, the Internet of Vehicles and Remote Surgery require both computing and networking functions working in harmony. The End-to-end (E2E) quality of experience (QoE) for these applications depends on the synchronous allocation of networking and computing resources. However, the relationship between the resources and the E2E QoE outcomes is typically stochastic and non-linear. In order to make efficient resource allocation decisions, it is essential to model these relationships. This article presents a novel machine-learning based approach to learn these relationships and concurrently orchestrate both resources for this purpose. The machine learning models further help make robust allocation decisions regarding stochastic variations and simplify robust optimization to a conventional constrained optimization. When resources are insufficient to accommodate all application requirements, our framework supports executing some of the applications with minimal degradation (graceful degradation) of E2E QoE. We also show how we can implement the learning and optimization methods in a distributed fashion by the Software-Defined Network (SDN) and Kubernetes technologies. Our results show that deep learning-based modelling achieves E2E QoE with approximately 99.8\% accuracy, and our robust joint-optimization technique allocates resources efficiently when compared to existing differential services alternatives.

preprint2022arXiv

Online Slice Reconfiguration for End-to-End QoE in 6G Applications

End-to-end (E2E) quality of experience (QoE) for 6G applications depends on the synchronous allocation of networking and computing resources, also known as slicing. However, the relationship between the resources and the E2E QoE outcomes is typically stochastic and non-stationary. Existing works consider known resource demands for slicing and formulate optimization problems for slice reconfiguration. In this work, we create and manage slices by learning the relationship between E2E QoE and resources. We develop a gradient-based online slice reconfiguration algorithm (OSRA) to reconfigure and manage slices in resource-constrained scenarios for radio access networks (RAN). We observe that our methodology meets the QoE requirements with high accuracy compared to existing approaches. It improves upon the existing approaches by approximately 98\% for bursty traffic variations. Our algorithm has fast convergence and achieves low E2E delay violations for lower priority slices.

preprint2022arXiv

Scheduling IoT Applications in Edge and Fog Computing Environments: A Taxonomy and Future Directions

Fog computing, as a distributed paradigm, offers cloud-like services at the edge of the network with low latency and high-access bandwidth to support a diverse range of IoT application scenarios. To fully utilize the potential of this computing paradigm, scalable, adaptive, and accurate scheduling mechanisms and algorithms are required to efficiently capture the dynamics and requirements of users, IoT applications, environmental properties, and optimization targets. This paper presents a taxonomy of recent literature on scheduling IoT applications in Fog computing. Based on our new classification schemes, current works in the literature are analyzed, research gaps of each category are identified, and respective future directions are described.

preprint2016arXiv

Fuzzy c-Shape: A new algorithm for clustering finite time series waveforms

The existence of large volumes of time series data in many applications has motivated data miners to investigate specialized methods for mining time series data. Clustering is a popular data mining method due to its powerful exploratory nature and its usefulness as a preprocessing step for other data mining techniques. This article develops two novel clustering algorithms for time series data that are extensions of a crisp c-shapes algorithm. The two new algorithms are heuristic derivatives of fuzzy c-means (FCM). Fuzzy c-Shapes plus (FCS+) replaces the inner product norm in the FCM model with a shape-based distance function. Fuzzy c-Shapes double plus (FCS++) uses the shape-based distance, and also replaces the FCM cluster centers with shape-extracted prototypes. Numerical experiments on 48 real time series data sets show that the two new algorithms outperform state-of-the-art shape-based clustering algorithms in terms of accuracy and efficiency. Four external cluster validity indices (the Rand index, Adjusted Rand Index, Variation of Information, and Normalized Mutual Information) are used to match candidate partitions generated by each of the studied algorithms. All four indices agree that for these finite waveform data sets, FCS++ gives a small improvement over FCS+, and in turn, FCS+ is better than the original crisp c-shapes method. Finally, we apply two tests of statistical significance to the three algorithms. The Wilcoxon and Friedman statistics both rank the three algorithms in exactly the same way as the four cluster validity indices.

preprint2012arXiv

Cramér-Rao Bounds for Polynomial Signal Estimation using Sensors with AR(1) Drift

We seek to characterize the estimation performance of a sensor network where the individual sensors exhibit the phenomenon of drift, i.e., a gradual change of the bias. Though estimation in the presence of random errors has been extensively studied in the literature, the loss of estimation performance due to systematic errors like drift have rarely been looked into. In this paper, we derive closed-form Fisher Information matrix and subsequently Cramér-Rao bounds (upto reasonable approximation) for the estimation accuracy of drift-corrupted signals. We assume a polynomial time-series as the representative signal and an autoregressive process model for the drift. When the Markov parameter for drift ρ<1, we show that the first-order effect of drift is asymptotically equivalent to scaling the measurement noise by an appropriate factor. For ρ=1, i.e., when the drift is non-stationary, we show that the constant part of a signal can only be estimated inconsistently (non-zero asymptotic variance). Practical usage of the results are demonstrated through the analysis of 1) networks with multiple sensors and 2) bandwidth limited networks communicating only quantized observations.

preprint2012arXiv

Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions

Ubiquitous sensing enabled by Wireless Sensor Network (WSN) technologies cuts across many areas of modern day living. This offers the ability to measure, infer and understand environmental indicators, from delicate ecologies and natural resources to urban environments. The proliferation of these devices in a communicating-actuating network creates the Internet of Things (IoT), wherein, sensors and actuators blend seamlessly with the environment around us, and the information is shared across platforms in order to develop a common operating picture (COP). Fuelled by the recent adaptation of a variety of enabling device technologies such as RFID tags and readers, near field communication (NFC) devices and embedded sensor and actuator nodes, the IoT has stepped out of its infancy and is the the next revolutionary technology in transforming the Internet into a fully integrated Future Internet. As we move from www (static pages web) to web2 (social networking web) to web3 (ubiquitous computing web), the need for data-on-demand using sophisticated intuitive queries increases significantly. This paper presents a cloud centric vision for worldwide implementation of Internet of Things. The key enabling technologies and application domains that are likely to drive IoT research in the near future are discussed. A cloud implementation using Aneka, which is based on interaction of private and public clouds is presented. We conclude our IoT vision by expanding on the need for convergence of WSN, the Internet and distributed computing directed at technological research community.

preprint2009arXiv

Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction

This paper presents a Grid portal for protein secondary structure prediction developed by using services of Aneka, a .NET-based enterprise Grid technology. The portal is used by research scientists to discover new prediction structures in a parallel manner. An SVM (Support Vector Machine)-based prediction algorithm is used with 64 sample protein sequences as a case study to demonstrate the potential of enterprise Grids.

Marimuthu Palaniswami

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

Learning to Reason: Temporal Saliency Distillation for Interpretable Knowledge Transfer

MemKD: Memory-Discrepancy Knowledge Distillation for Efficient Time Series Classification

Achieving AI-enabled Robust End-to-End Quality of Experience over Radio Access Networks

Online Slice Reconfiguration for End-to-End QoE in 6G Applications

Scheduling IoT Applications in Edge and Fog Computing Environments: A Taxonomy and Future Directions

Fuzzy c-Shape: A new algorithm for clustering finite time series waveforms

Cramér-Rao Bounds for Polynomial Signal Estimation using Sensors with AR(1) Drift

Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions

Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction