Source author record

Poulami Das

Poulami Das appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

quant-ph Computation and Language Computer Vision Hardware Architecture Artificial Intelligence Cryptography and Security Distributed, Parallel, and Cluster Computing Emerging Technologies Information Retrieval Machine Learning Multimedia

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

Mixture-of-Expert (MoE) models enable efficient inference by employing smaller experts and activating only a subset of them per token. MoE serving engines distribute experts across multiple GPUs and route tokens to appropriate GPUs at inference time based on experts activated. They process tokens in lock-step fashion, where tokens within a batch must finish processing before proceeding to the next layer. This synchronization barrier acts as a critical bottleneck because the performance of MoE models is limited by the straggler GPU that finishes last. Stragglers emerge when too many heavily used experts are placed on the same GPU or the slowest GPU. While prior works place experts that balance token loads across GPUs, they all overlook GPU variability and often place highly used experts on the slowest GPUs. We propose GEM, GPU-variability-aware Expert Mapping, a framework for GPU variability-aware expert to GPU mapping for MoE models. GEM exploits two insights. First, we must place experts such that each GPU receives non-uniform token loads based on their variability and they all finish processing a layer at about the same time. Our studies show that there are two types of experts: consistent that are used most of the time and temporal that are often used together for the remaining time. Our second insight is that we must place simultaneously used consistent and temporal experts on different GPUs and avoid placing them on slower GPUs to reduce slowdown. GEM gathers the variability profile of GPUs for each model and task and uses the token load distributions per task to map experts to GPUs. Our experiments show that GEM improves end-to-end latency by 7.9% on average and by up to 16.5% compared to the baseline.

preprint2026arXiv

Test-Time Speculation

Speculative decoding accelerates LLM inference by using a fast draft model to generate tokens and a more accurate target model to verify them. Its performance depends on the $\textit{acceptance length}$, or number of draft tokens accepted by the target. Our studies show that the acceptance length of even state-of-the-art speculators, like DFlash, EAGLE-3 and PARD degrade with generation length, reaching values close to 1 (i.e. no speedup) within just a few thousand output tokens, making speculators ineffective for long-response tasks. Acceptance lengths decline because most speculators are trained offline on short sequences, but are forced to match the target model on much longer outputs at inference, well beyond their training distribution. To address this issue, we propose $\textit{Test-Time Speculation (TTS)}$, an online distillation approach that continuously adapts the speculator at test-time. TTS leverages the key insight that the token verification step already invokes the target model for each draft token, providing the training signal needed to adapt the draft at no additional cost. Treating the draft as the student and the target as a teacher, TTS adjusts the draft over several speculation rounds, with each update improving the draft's accuracy as generation proceeds. Our results across multiple models from the Qwen-3, Qwen-3.5, and Llama3.1 families show that TTS improves acceptance lengths over state-of-the-art speculators by up to $72\%$ and $41\%$ on average, with the benefits scaling with increased generation lengths.

preprint2022arXiv

ForeSight: Reducing SWAPs in NISQ Programs via Adaptive Multi-Candidate Evaluations

Near-term quantum computers are noisy and have limited connectivity between qubits. Compilers are required to introduce SWAP operations in order to perform two-qubit gates between non-adjacent qubits. SWAPs increase the number of gates and depth of programs, making them even more vulnerable to errors. Moreover, they relocate qubits which affect SWAP selections for future gates in a program. Thus, compilers must select SWAP routes that not only minimize the overheads for the current operation, but also for future gates. Existing compilers tend to select paths with the fewest SWAPs for the current operations, but do not evaluate the impact of the relocations from the selected SWAP candidate on future SWAPs. Also, they converge on SWAP candidates for the current operation and only then decide SWAP routes for future gates, thus severely restricting the SWAP candidate search space for future operations. We propose ForeSight, a compiler that simultaneously evaluates multiple SWAP candidates for several operations into the future, delays SWAP selections to analyze their impact on future SWAP decisions and avoids early convergence on sub-optimal candidates. Moreover, ForeSight evaluates slightly longer SWAP routes for current operations if they have the potential to reduce SWAPs for future gates, thus reducing SWAPs for the program globally. As compilation proceeds, ForeSight dynamically adds new SWAP candidates to the solution space and eliminates the weaker ones. This allows ForeSight to reduce SWAP overheads at program-level while keeping the compilation complexity tractable. Our evaluations with a hundred benchmarks across three devices show that ForeSight reduces SWAP overheads by 17% on average and 81% in the best-case, compared to the baseline. ForeSight takes minutes, making it scalable to large programs.

preprint2022arXiv

HAMMER: boosting fidelity of noisy Quantum circuits by exploiting Hamming behavior of erroneous outcomes

Quantum computers with hundreds of qubits will be available soon. Unfortunately, high device error-rates pose a significant challenge in using these near-term quantum systems to power real-world applications. Executing a program on existing quantum systems generates both correct and incorrect outcomes, but often, the output distribution is too noisy to distinguish between them. In this paper, we show that erroneous outcomes are not arbitrary but exhibit a well-defined structure when represented in the Hamming space. Our experiments on IBM and Google quantum computers show that the most frequent erroneous outcomes are more likely to be close in the Hamming space to the correct outcome. We exploit this behavior to improve the ability to infer the correct outcome. We propose Hamming Reconstruction (HAMMER), a post-processing technique that leverages the observation of Hamming behavior to reconstruct the noisy output distribution, such that the resulting distribution has higher fidelity. We evaluate HAMMER using experimental data from Google and IBM quantum computers with more than 500 unique quantum circuits and obtain an average improvement of 1.37x in the quality of solution. On Google's publicly available QAOA datasets, we show that HAMMER sharpens the gradients on the cost function landscape.

preprint2020arXiv

A Scalable Decoder Micro-architecture for Fault-Tolerant Quantum Computing

Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster than they accumulate in the quantum computer and that must be implemented with minimum hardware resources in order to scale to the regime of practical applications. In this work, we consider surface code error correction, which is the most popular family of error correcting codes for quantum computing, and we design a decoder micro-architecture for the Union-Find decoding algorithm. We propose a three-stage fully pipelined hardware implementation of the decoder that significantly speeds up the decoder. Then, we optimize the amount of decoding hardware required to perform error correction simultaneously over all the logical qubits of the quantum computer. By sharing resources between logical qubits, we obtain a 67% reduction of the number of hardware units and the memory capacity is reduced by 70%. Moreover, we reduce the bandwidth required for the decoding process by a factor at least 30x using low-overhead compression algorithms. Finally, we provide numerical evidence that our optimized micro-architecture can be executed fast enough to correct errors in a quantum computer.

preprint2013arXiv

Embedding of Blink Frequency in Electrooculography Signal using Difference Expansion based Reversible Watermarking Technique

In the past few years, like other fields, rapid expansion of digitization and globalization has influenced the medical field as well. For progress of diagnostic results most of the reputed hospitals and diagnostic centres all over the world have started exchanging medical information. In this proposed method, the calculated diagnostic parametric values of the original Electrooculography (EOG) signal are embedded as a watermark by using Difference Expansion (DE) algorithm based reversible watermarking technique. The extracted watermark provides the required parametric values at the recipient end without any post computation of the recovered EOG signal. By computing the parametric values from the recovered signal, the integrity of the extracted watermark can be validated. The time domain features of EOG signal are calculated for the generation of watermark. In the current work, various features are studied and two major features related to blink frequency are used to generate the watermark. The high Signal to Noise Ratio (SNR) and the Bit Error Rate (BER) claim the robustness of the proposed method.

preprint2013arXiv

Multilevel Threshold Based Gray Scale Image Segmentation using Cuckoo Search

Image Segmentation is a technique of partitioning the original image into some distinct classes. Many possible solutions may be available for segmenting an image into a certain number of classes, each one having different quality of segmentation. In our proposed method, multilevel thresholding technique has been used for image segmentation. A new approach of Cuckoo Search (CS) is used for selection of optimal threshold value. In other words, the algorithm is used to achieve the best solution from the initial random threshold values or solutions and to evaluate the quality of a solution correlation function is used. Finally, MSE and PSNR are measured to understand the segmentation quality.

preprint2013arXiv

Odd-Even Embedding Scheme Based Modified Reversible Watermarking Technique using Blueprint

Digital watermarking is a technique of information adding or information hiding in order to identify the owner of the data in multimedia content. It seems that a signal or digital image can permanently embed over another digital data providing a good way to protect intellectual property from illegal replication. The cover data that is transmitted through the internet hides the watermark in a computer aided assertion method such that it becomes undetectable. Finally it stands as a hindrance over many operations without harming the embedded host document. Unfortunately, many owners of the digital materials such as images, text, audio and video are reluctant to the spreading of their documents on the web or other networked environment, because the ease of duplicating digital materials facilitates copyright violation. Digital media distribution occurs through various channels. The cover data may or may not hold any relation with the watermark information. In the last two decades, a considerable amount of research has been done on the digital watermarking of multimedia files such as audio, video, images and text. Different type of watermarking algorithms has been proposed by the researchers to achieve high level of security and authenticity. In our proposed method, a modified reversible watermarking technique is introduced, which employs a blueprint generation of original image based on odd-even embedding methodology to yield large data hiding capacity, security as well as high watermarked quality. The experimental results demonstrate that, no matter how much secret data is embedded, the watermarked quality is about 51dB in this proposed scheme.

Poulami Das

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

Test-Time Speculation

ForeSight: Reducing SWAPs in NISQ Programs via Adaptive Multi-Candidate Evaluations

HAMMER: boosting fidelity of noisy Quantum circuits by exploiting Hamming behavior of erroneous outcomes

A Scalable Decoder Micro-architecture for Fault-Tolerant Quantum Computing

Embedding of Blink Frequency in Electrooculography Signal using Difference Expansion based Reversible Watermarking Technique

Multilevel Threshold Based Gray Scale Image Segmentation using Cuckoo Search

Odd-Even Embedding Scheme Based Modified Reversible Watermarking Technique using Blueprint