Source author record

Amlan Chakrabarti

Amlan Chakrabarti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture quant-ph Computer Vision Emerging Technologies Artificial Intelligence Cryptography and Security Databases Applications Computation and Language cs.CY Data Structures and Algorithms hep-ex Machine Learning Other Computer Science physics.ins-det Sound

Catalog footprint

What is connected

39works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A New Technique for AI Explainability using Feature Association Map

Lack of transparency in AI systems poses challenges in critical real-life applications. It is important to be able to explain the decisions of an AI system to ensure trust on the system. Explainable AI (XAI) algorithms play a vital role in achieving this objective. In this paper, we are proposing a new algorithm for Explaining AI systems, FAMeX (Feature Association Map based eXplainability). The proposed algorithm is based on a graph-theoretic formulation of the feature set termed as Feature Association Map (FAM). The foundation of the modelling is based on association between features. The proposed FAMeX algorithm has been found to be better than the competing XAI algorithms - Permutation Feature Importance (PFI) and SHapley Additive exPlanations (SHAP). Experiments conducted with eight benchmark algorithms show that FAMeX is able to gauge feature importance in the context of classification better than the competing algorithms. This definitely shows that FAMeX is a promising algorithm in explaining the predictions from an AI system

preprint2026arXiv

PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification

Phishing detection systems are predominantly rely on statistical machine learning models, which often lack contextual reasoning and are vulnerable to adversarial manipulation. In this work, we propose a hybrid framework that integrates machine learning classifiers with non-monotonic reasoning using Answer Set Programming (ASP) to enable context-aware decision refinement. The proposed post-hoc reasoning layer incorporates expert knowledge to revise classifier predictions through formal belief revisions. Experimental results indicate that the reasoning module modifies 5.08\% of classifier outputs, leading to improved decision consistency. A key advantage is that new domain knowledge can be incorporated into the reasoning layer in $\mathcal{O}(n)$ time, eliminating the need for model retraining.

preprint2025arXiv

From Transformers to LLMs: A Systematic Survey of Efficiency Considerations in NLP

The emergence of Transformer-based Large Language Models (LLMs) has substantially augmented the capabilities of Natural Language Processing (NLP), thereby intensifying the demand for computational resources. Therefore, enhancing efficiency based on factors like computational requirements, energy consumption, carbon footprint and financial cost has become a vital area of research. This motivates us to conduct a systematic literature review on Transformer-based LLMs in NLP from the perspective of efficiency. In this survey of 312 articles published between the years 2011 and 2025, efficiency-improvement endeavors have been systematically discussed targeting various aspects such as data curation, model design, model downsizing, and dynamic inferencing. This has been augmented with efficiency considerations in model adaptation strategies like pre-training, fine-tuning, prompt-engineering and Retrieval-Augmented Generation (RAG). Furthermore, a statistical analysis of the articles has been performed followed by an in-depth evaluation of the efficiency and efficacy of more than 30 renowned NLP models has been conducted on 13 evaluation benchmarks. This paper offers valuable insights for researchers, professionals as well as scholars, and explores the trend of research toward sustainable practices in NLP.

preprint2022arXiv

Asymptotically Improved Circuit for $d$-ary Grover's Algorithm with Advanced Decomposition of $n$-qudit Toffoli Gate

The progress in building quantum computers to execute quantum algorithms has recently been remarkable. Grover's search algorithm in a binary quantum system provides considerable speed-up over classical paradigm. Further, Grover's algorithm can be extended to a $d$-ary (qudit) quantum system for utilizing the advantage of larger state space, which helps to reduce the run-time of the algorithm as compared to the traditional binary quantum systems. In a qudit quantum system, an $n$-qudit Toffoli gate plays a significant role in the accurate implementation of Grover's algorithm. In this article, a generalized $n$-qudit Toffoli gate has been realized using higher dimensional qudits to attain a logarithmic depth decomposition without ancilla qudit. The circuit for Grover's algorithm has then been designed for any $d$-ary quantum system, where $d \ge 2$, with the proposed $n$-qudit Toffoli gate to obtain optimized depth compared to earlier approaches. The technique for decomposing an $n$-qudit Toffoli gate requires access to two immediately higher energy levels, making the design susceptible to errors. Nevertheless, we show that the percentage decrease in the probability of error is significant as we have reduced both gate count and circuit depth as compared to that in state-of-the-art works.

preprint2022arXiv

Intermediate Qutrit-based Improved Quantum Arithmetic Operations with Application on Financial Derivative Pricing

In some quantum algorithms, arithmetic operations are of utmost importance for resource estimation. In binary quantum systems, some efficient implementation of arithmetic operations like, addition/subtraction, multiplication/division, square root, exponential and arcsine etc. have been realized, where resources are reported as a number of Toffoli gates or T gates with ancilla. Recently it has been demonstrated that intermediate qutrits can be used in place of ancilla, allowing us to operate efficiently in the ancilla-free frontier zone. In this article, we have incorporated intermediate qutrit approach to realize efficient implementation of all the quantum arithmetic operations mentioned above with respect to gate count and circuit-depth without T gate and ancilla. Our resource estimates with intermediate qutrits could guide future research aimed at lowering costs considering arithmetic operations for computational problems. As an application of computational problems, related to finance, are poised to reap the benefit of quantum computers, in which quantum arithmetic circuits are going to play an important role. In particular, quantum arithmetic circuits of arcsine and square root are necessary for path loading using the re-parameterization method, as well as the payoff calculation for derivative pricing. Hence, the improvements are studied in the context of the core arithmetic circuits as well as the complete application of derivative pricing. Since our intermediate qutrit approach requires to access higher energy levels, making the design prone to errors, nevertheless, we show that the percentage decrease in the probability of error is significant owing to the fact that we achieve circuit robustness compared to qubit-only works.

preprint2022arXiv

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

The exponential emergence of Field Programmable Gate Array (FPGA) has accelerated the research of hardware implementation of Deep Neural Network (DNN). Among all DNN processors, domain specific architectures, such as, Google's Tensor Processor Unit (TPU) have outperformed conventional GPUs. However, implementation of TPUs in reconfigurable hardware should emphasize energy savings to serve the green computing requirement. Voltage scaling, a popular approach towards energy savings, can be a bit critical in FPGA as it may cause timing failure if not done in an appropriate way. In this work, we present an ultra low power FPGA implementation of a TPU for edge applications. We divide the systolic-array of a TPU into different FPGA partitions, where each partition uses different near threshold (NTC) biasing voltages to run its FPGA cores. The biasing voltage for each partition is roughly calculated by the proposed static schemes. However, further calibration of biasing voltage is done by the proposed runtime scheme. Four clustering algorithms based on the minimum slack value of different design paths of Multiply Accumulates (MACs) study the partitioning of FPGA. To overcome the timing failure caused by NTC, the MACs which have higher minimum slack are placed in lower voltage partitions and the MACs have lower minimum slack path are placed in higher voltage partitions. The proposed architecture is simulated in a commercial platform : Vivado with Xilinx Artix-7 FPGA and academic platform VTR with 22nm, 45nm, 130nm FPGAs. The simulation results substantiate the implementation of voltage scaled TPU in FPGAs and also justifies its power efficiency.

preprint2021arXiv

Circuit Design for $k$-coloring Problem and Its Implementation in Any Dimensional Quantum System

With the evolution of quantum computing, researchers now-a-days tend to incline to find solutions to NP-complete problems by using quantum algorithms in order to gain asymptotic advantage. In this paper, we solve $k$-coloring problem (NP-complete problem) using Grover's algorithm in any dimensional quantum system or any $d$-ary quantum system for the first time to the best of our knowledge, where $d \ge 2$. A newly proposed comparator-based approach helps to generalize the implementation of the $k$-coloring problem in any dimensional quantum system. Till date, $k$-coloring problem has been implemented only in binary and ternary quantum system, hence, we abide to $d=2$ or $d=3$, that is for binary and ternary quantum system for comparing our proposed work with the state-of-the-art techniques. This proposed approach makes the reduction of the qubit cost possible, compared to the state-of-the-art binary quantum systems. Further, with the help of newly proposed ternary comparator, a substantial reduction in quantum gate count for the ternary oracle circuit of the $k$-coloring problem than the previous approaches has been obtained. An end-to-end automated framework has been put forward for implementing the $k$-coloring problem for any undirected and unweighted graph on any available Near-term quantum devices or Noisy Intermediate-Scale Quantum (NISQ) devices or multi-valued quantum simulator, which helps in generalizing our approach.

preprint2021arXiv

Circuit Design for Clique Problem and Its Implementation on Quantum Computer

Finding cliques in a graph has several applications for its pattern matching ability. $k$-clique problem, a special case of clique problem, determines whether an arbitrary graph contains a clique of size $k$, has already been addressed in quantum domain. A variant of $k$-clique problem that lists all cliques of size $k$, has also popular modern-day applications. Albeit, the implementation of such variant of $k$-clique problem in quantum setting still remains untouched. In this paper, apart from theoretical solution of such $k$-clique problem, practical quantum gate-based implementation has been addressed using Grover's algorithm. This approach is further extended to design circuit for the maximum clique problem in classical-quantum hybrid architecture. The algorithm automatically generates the circuit for any given undirected and unweighted graph and any given $k$, which makes our approach generalized in nature. The proposed approach of solving $k$-clique problem has exhibited a reduction of qubit cost and circuit depth as compared to the state-of-the-art approach, for a small $k$ with respect to a large graph. A framework that can map the automated generated circuit for clique problem to quantum devices is also proposed. An analysis of the experimental results is demonstrated using IBM's Qiskit.

preprint2021arXiv

Faster Search of Clustered Marked States with Lackadaisical Quantum Walks

The nature of discrete-time quantum walk in the presence of multiple marked states has been studied by Nahimovs and Rivosh. They introduced an exceptional configuration of clustered marked states $i.e.,$ if the marked states are arranged in a $\sqrt{k} \times \sqrt{k}$ cluster within a $\sqrt{N} \times \sqrt{N}$ grid, where $k=n^{2}$ and $n$ an odd integer. They showed that finding a single marked state among the multiple ones using quantum walk with AKR (Ambainis, Kempe and Rivosh) coin requires $Ω(\sqrt{N} - \sqrt{k})$ time. Furthermore, Nahimov and Rivosh also showed that the Grover's coin can find the same configuration of marked state both faster and with higher probability compared to that with the AKR coin. In this article, we show that using lackadaisical quantum walk, a variant of a three-state discrete-time quantum walk on a line, the success probability of finding all the clustered marked states of this exceptional configuration is nearly 1 with smaller run-time. We also show that the weights of the self-loop suggested for multiple marked states in the state-of-the-art works are not optimal for this exceptional configuration of clustered mark states. We propose a range of weights of the self-loop from which only one can give the desired result for this configuration.

preprint2021arXiv

Moving Quantum States without SWAP via Intermediate Higher Dimensional Qudits

Quantum algorithms can be realized in the form of a quantum circuit. To map quantum circuit for specific quantum algorithm to quantum hardware, qubit mapping is an imperative technique based on the qubit topology. Due to the neighbourhood constraint of qubit topology, the implementation of quantum algorithm rightly, is essential for moving information around in a quantum computer. Swapping of qubits using SWAP gate moves the quantum state between two qubits and solves the neighbourhood constraint of qubit topology. Though, one needs to decompose the SWAP gate into three CNOT gates to implement SWAP gate efficiently, but unwillingly quantum cost with respect to gate count and depth increases. In this paper, a new formalism of moving quantum states without using SWAP operation is introduced for the first time to the best of our knowledge. Moving quantum states through qubits have been attained with the adoption of temporary intermediate qudit states. This introduction of intermediate qudit states has exhibited a three times reduction in quantum cost with respect to gate count and approximately two times reduction in respect to circuit depth compared to the state-of-the-art approach of SWAP gate insertion. Further, the proposed approach is generalized to any dimensional quantum system.

preprint2021arXiv

Qurzon: A Prototype for a Divide and Conquer Based Quantum Compiler

When working with algorithms on quantum devices, quantum memory becomes a crucial bottleneck due to low qubit count in NISQ-era devices. In this context, the concept of `divide and compute', wherein a quantum circuit is broken into several subcircuits and executed separately, while stitching the results of the circuits via classical post-processing, becomes a viable option, especially in NISQ-era devices. This paper introduces \textbf{Qurzon}, a proposed novel quantum compiler that incorporates the marriage of techniques of divide and compute with the state-of-the-art algorithms of optimal qubit placement for executing on real quantum devices. A scheduling algorithm is also introduced within the compiler that can explore the power of distributed quantum computing while paving the way for quantum parallelism for large algorithms. Several benchmark circuits have been executed using the compiler, thereby demonstrating the power of the divide and compute when working with real NISQ-era quantum devices.

preprint2020arXiv

2D Qubit Placement of Quantum Circuits using LONGPATH

In order to achieve speedup over conventional classical computing for finding solution of computationally hard problems, quantum computing was introduced. Quantum algorithms can be simulated in a pseudo quantum environment, but implementation involves realization of quantum circuits through physical synthesis of quantum gates. This requires decomposition of complex quantum gates into a cascade of simple one qubit and two qubit gates. The methodological framework for physical synthesis imposes a constraint regarding placement of operands (qubits) and operators. If physical qubits can be placed on a grid, where each node of the grid represents a qubit then quantum gates can only be operated on adjacent qubits, otherwise SWAP gates must be inserted to convert non-Linear Nearest Neighbor architecture to Linear Nearest Neighbor architecture. Insertion of SWAP gates should be made optimal to reduce cumulative cost of physical implementation. A schedule layout generation is required for placement and routing apriori to actual implementation. In this paper, two algorithms are proposed to optimize the number of SWAP gates in any arbitrary quantum circuit. The first algorithm is intended to start with generation of an interaction graph followed by finding the longest path starting from the node with maximum degree. The second algorithm optimizes the number of SWAP gates between any pair of non-neighbouring qubits. Our proposed approach has a significant reduction in number of SWAP gates in 1D and 2D NTC architecture.

preprint2020arXiv

The Blockchain Based Auditor on Secret key Life Cycle in Reconfigurable Platform

The growing sophistication of cyber attacks, vulnerabilities in high computing systems and increasing dependency on cryptography to protect our digital data make it more important to keep secret keys safe and secure. Few major issues on secret keys like incorrect use of keys, inappropriate storage of keys, inadequate protection of keys, insecure movement of keys, lack of audit logging, insider threats and non-destruction of keys can compromise the whole security system dangerously. In this article, we have proposed and implemented an isolated secret key memory which can log life cycle of secret keys cryptographically using blockchain (BC) technology. We have also implemented a special custom bus interconnect which receives custom crypto instruction from Processing Element (PE). During the execution of crypto instructions, the architecture assures that secret key will never come in the processor area and the movement of secret keys to various crypto core is recorded cryptographically after the proper authentication process controlled by proposed hardware based BC. To the best of our knowledge, this is the first work which uses blockchain based solution to address the issues of the life cycle of the secret keys in hardware platform. The additional cost of resource usage and timing complexity we spent to implement the proposed idea is very nominal. We have used Xilinx Vivado EDA tool and Artix 7 FPGA board.

preprint2016arXiv

A review on application of data mining techniques to combat natural disasters

Thousands of human lives are lost every year around the globe, apart from significant damage on property, animal life, etc., due to natural disasters (e.g., earthquake, flood, tsunami, hurricane and other storms, landslides, cloudburst, heat wave, forest fire). In this paper, we focus on reviewing the application of data mining and analytical techniques designed so far for (i) prediction, (ii) detection, and (iii) development of appropriate disaster management strategy based on the collected data from disasters. A detailed description of availability of data from geological observatories (seismological, hydrological), satellites, remote sensing and newer sources like social networking sites as twitter is presented. An extensive and in-depth literature study on current techniques for disaster prediction, detection and management has been done and the results are summarized according to various types of disasters. Finally a framework for building a disaster management database for India hosted on open source Big Data platform like Hadoop in a phased manner has been proposed. The study has special focus on India which ranks among top five counties in terms of absolute number of the loss of human life.

preprint2016arXiv

Accelerating More Secure RC4 : Implementation of Seven FPGA Designs in Stages upto 8 byte per clock

RC4 can be made more secured if an additional RC4-like Post-KSA Random Shuffing (PKRS) process is introduced between KSA and PRGA. It can also be made significantly faster if RC4 bytes are processed in a FPGA embedded system using multiple coprocessors functioning in parallel. The PKRS process is tuned to form as many S-boxes as required by particular design architectures involving multiple coprocessors, each one undertaking byte-by-byte processing. Following a ecent idea [1] [2] the speed of execution of each processor is also enhanced by another fold if the byte-by-byte processing is replaced by a scheme of processing two consecutive bytes together. Adopting some new innovative concepts, three hardware design architectures are proposed in a suitable FPGA embedded system involving 1, 2 and 4 coprocessors functioning in parallel and a study is made on accelerating RC4 by processing bytes in byte-by-byte mode achieving throughputs from 1-byte-in-1-clock to 4-bytes-in-1-clock. The hardware designs are appropriately upgraded to accelerate RC4 further by processing 2 onsecutive RC4 bytes together and it has been possible to achieve a maximum throughput of 8-bytes per clock in Xilinx Virtex-5 LX110t FPGA [3] architecture followed by secured data communication between two FPGA boards.

preprint2015arXiv

A Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital Images

Edge preserving filters preserve the edges and its information while blurring an image. In other words they are used to smooth an image, while reducing the edge blurring effects across the edge like halos, phantom etc. They are nonlinear in nature. Examples are bilateral filter, anisotropic diffusion filter, guided filter, trilateral filter etc. Hence these family of filters are very useful in reducing the noise in an image making it very demanding in computer vision and computational photography applications like denoising, video abstraction, demosaicing, optical-flow estimation, stereo matching, tone mapping, style transfer, relighting etc. This paper provides a concrete introduction to edge preserving filters starting from the heat diffusion equation in olden to recent eras, an overview of its numerous applications, as well as mathematical analysis, various efficient and optimized ways of implementation and their interrelationships, keeping focus on preserving the boundaries, spikes and canyons in presence of noise. Furthermore it provides a realistic notion for efficient implementation with a research scope for hardware realization for further acceleration.

preprint2015arXiv

A Novel Approach for Human Action Recognition from Silhouette Images

In this paper, a novel human action recognition technique from video is presented. Any action of human is a combination of several micro action sequences performed by one or more body parts of the human. The proposed approach uses spatio-temporal body parts movement (STBPM) features extracted from foreground silhouette of the human objects. The newly proposed STBPM feature estimates the movements of different body parts for any given time segment to classify actions. We also proposed a rule based logic named rule action classifier (RAC), which uses a series of condition action rules based on prior knowledge and hence does not required training to classify any action. Since we don't require training to classify actions, the proposed approach is view independent. The experimental results on publicly available Wizeman and MuHVAi datasets are compared with that of the related research work in terms of accuracy in the human action detection, and proposed technique outperforms the others.

preprint2015arXiv

A Novel Method for Soft Error Mitigation in FPGA using Adaptive Cross Parity Code

Field Programmable Gate Arrays (FPGAs) are more prone to be affected by transient faults in presence of radiation and other environmental hazards compared to Application Specific Integrated Circuits (ASICs). Hence, error mitigation and recovery techniques are absolutely necessary to protect the FPGA hardware from soft errors arising due to such transient faults. In this paper, a new efficient multi-bit error correcting method for FPGAs is proposed using adaptive cross parity check (ACPC) code. ACPC is easy to implement and the needed decoding circuit is also simple. In the proposed scheme total configuration memory is partitioned into two parts. One part will contain ACPC hardware, which is static and assumed to be unaffected by any kind of errors. Other portion will store the binary file for logic, which is to be protected from transient error and is assumed to be dynamically reconfigurable (Partial reconfigurable area). Binary file from the secondary memory passes through ACPC hardware and the bits for forward error correction (FEC) field are calculated before entering into the reconfigurable portion. In the runtime scenario, the data from the dynamically reconfigurable portion of the configuration memory will be read back and passed through the ACPC hardware. The ACPC hardware will correct the errors before the data enters into the dynamic configuration memory. We propose a first of its kind methodology for novel transient fault correction using ACPC code for FPGAs. To validate the design we have tested the proposed methodology with Kintex FPGA. We have also measured different parameters like critical path, power consumption, overhead resource and error correction efficiency to estimate the performance of our proposed method.

preprint2015arXiv

A Novel Reconfigurable Hardware Design for Speech Enhancement Based on Multi-Band Spectral Subtraction Involving Magnitude and Phase Components

This paper proposes an efficient reconfigurable hardware design for speech enhancement based on multi band spectral subtraction algorithm and involving both magnitude and phase components. Our proposed design is novel as it estimates environmental noise from speech adaptively utilizing both magnitude and phase components of the speech spectrum. We performed multi-band spectrum subtraction by dividing the noisy speech spectrum into different non-uniform frequency bands having varying signal to noise ratio (SNR) and subtracting the estimated noise from each of these frequency bands. This results to the elimination of noise from both high SNR and low SNR signal components for all the frequency bands. We have coined our proposed speech enhancement technique as Multi Band Magnitude Phase Spectral Subtraction (MBMPSS). The magnitude and phase operations are executed concurrently exploiting the parallel logic blocks of Field Programmable Gate Array (FPGA), thus increasing the throughput of the system to a great extent. We have implemented our design on Spartan6 Lx45 FPGA and presented the implementation result in terms of resource utilization and delay information for the different blocks of our design. To the best of our best knowledge, this is a new type of hardware design for speech enhancement application and also a first of its kind implementation on reconfigurable hardware. We have used benchmark audio data for the evaluation of the proposed hardware and the experimental results show that our hardware shows a better SNR value compared to the existing state of the art research works.

preprint2015arXiv

FPGA based High Speed Data Acquisition System for High Energy Physics Application

In high energy physics experiments (HEP), high speed and fault resilient data communication is needed between detectors/sensors and the host PC. Transient faults can occur in the communication hardware due to various external effects like presence of charged particles, noise in the environment or radiation effects in HEP experiments and that leads to single/multiple bit error. In order to keep the communication system functional in such a radiation environment where direct intervention of human is not possible, a high speed data acquisition (DAQ) architecture is necessary which supports error recovery. This design presents an efficient implementation of field programmable gate array (FPGA) based high speed DAQ system with optical communication link supported by multi-bit error correcting model. The design has been implemented on Xilinx Kintex-7 board and is tested for board to board communication as well as for PC communication using PCI (Peripheral Component Interconnect express). Data communication speed up to 4.8 Gbps has been achieved in board to board and board to PC communication and estimation of resource utilization and critical path delay are also measured.

preprint2015arXiv

FPGA based Novel High Speed DAQ System Design with Error Correction

Present state of the art applications in the area of high energy physics experiments (HEP), radar communication, satellite communication and bio medical instrumentation require fault resilient data acquisition (DAQ) system with the data rate in the order of Gbps. In order to keep the high speed DAQ system functional in such radiation environment where direct intervention of human is not possible, a robust and error free communication system is necessary. In this work we present an efficient DAQ design and its implementation on field programmable gate array (FPGA). The proposed DAQ system supports high speed data communication (~4.8 Gbps) and achieves multi-bit error correction capabilities. BCH code (named after Raj Bose and D. K. RayChaudhuri) has been used for multi-bit error correction. The design has been implemented on Xilinx Kintex-7 board and is tested for board to board communication as well as for board to PC using PCIe (Peripheral Component Interconnect express) interface. To the best of our knowledge, the proposed FPGA based high speed DAQ system utilizing optical link and multi-bit error resiliency can be considered first of its kind. Performance estimation of the implemented DAQ system is done based on resource utilization, critical path delay, efficiency and bit error rate (BER).

preprint2015arXiv

High speed fault tolerant secure communication for muon chamber using fpga based gbt emulator

The Compressed Baryonic Matter (CBM) experiment is a part of the Facility for Antiproton and Ion Research (FAIR) in Darmstadt at the GSI. The CBM experiment will investigate the highly compressed nuclear matter using nucleus-nucleus collisions. This experiment will examine heavy-ion collisions in fixed target geometry and will be able to measure hadrons, electrons and muons. CBM requires precise time synchronization, compact hardware, radiation tolerance, self-triggered front-end electronics, efficient data aggregation schemes and capability to handle high data rate (up to several TB/s). As a part of the implementation of read out chain of MUCH in India, we have tried to implement FPGA based emulator of GBTx in India. GBTx is a radiation tolerant ASIC that can be used to implement multipurpose high speed bidirectional optical links for high-energy physics (HEP) experiments and is developed by CERN. GBTx will be used in highly irradiated area and more prone to be affected by multi bit error. To mitigate this effect instead of single bit error correcting RS code we have used two bit error correcting (15, 7) BCH code. It will increase the redundancy which in turn increases the reliability of the coded data. So the coded data will be less prone to be affected by noise due to radiation. Data will go from detector to PC through multiple nodes through the communication channel. In order to make the data communication secure, advanced encryption standard (AES - a symmetric key cryptography) and RSA (asymmetric key cryptography) are used after the channel coding.

preprint2014arXiv

An Approach for Reducing Outliers of Non Local Means Image Denoising Filter

We propose an adaptive approach for non local means (NLM) image filtering termed as non local adaptive clipped means (NLACM), which reduces the effect of outliers and improves the denoising quality as compared to traditional NLM. Common method to neglect outliers from a data population is computation of mean in a range defined by mean and standard deviation. In NLACM we perform the median within the defined range based on statistical estimation of the neighbourhood region of a pixel to be denoised. As parameters of the range are independent of any additional input and is based on local intensity values, hence the approach is adaptive. Experimental results for NLACM show better estimation of true intensity from noisy neighbourhood observation as compared to NLM at high noise levels. We have verified the technique for speckle noise reduction and we have tested it on ultrasound (US) image of lumbar spine. These ultrasound images act as guidance for injection therapy for treatment of lumbar radiculopathy. We believe that the proposed approach for image denoising is first of its kind and its efficiency can be well justified as it shows better performance in image restoration.

preprint2014arXiv

Design space exploration for image processing architectures on FPGA targets

Due to the emergence of embedded applications in image and video processing, communication and cryptography, improvement of pictorial information for better human perception like deblurring, denoising in several fields such as satellite imaging, medical imaging, mobile applications etc. are gaining importance for renewed research. Behind such developments, the primary responsibility lies with the advancement of semiconductor technology leading to FPGA based programmable logic devices, which combines the advantages of both custom hardware and dedicated DSP resources. In addition, FPGA provides powerful reconfiguration feature and hence is an ideal target for rapid prototyping. We have endeavoured to exploit exceptional features of FPGA technology in respect to hardware parallelism leading to higher computational density and throughput, and have observed better performances than those one can get just merely porting the image processing software algorithms to hardware. In this paper, we intend to present an elaborate review, based on our expertise and experiences, on undertaking necessary transformation to an image processing software algorithm including the optimization techniques that makes its operation in hardware comparatively faster.

preprint2014arXiv

Fault Detection for RC4 Algorithm and its Implementation on FPGA Platform

In hardware implementation of a cryptographic algorithm, one may achieve leakage of secret information by creating scopes to introduce controlled faulty bit(s) even though the algorithm is mathematically a secured one. The technique is very effective in respect of crypto processors embedded in smart cards. In this paper few fault detecting architectures for RC4 algorithm are designed and implemented on Virtex5(ML505, LX110t) FPGA board. The results indicate that the proposed architectures can handle most of the faults without loss of throughput consuming marginally additional hardware and power.

preprint2014arXiv

Hardware Implementation of four byte per clock RC4 algorithm

In the field of cryptography till date the 2-byte in 1-clock is the best known RC4 hardware design [1], while 1-byte in 1-clock [2], and the 1-byte in 3 clocks [3][4] are the best known implementation. The design algorithm in[2] considers two consecutive bytes together and processes them in 2 clocks. The design [1] is a pipelining architecture of [2]. The design of 1-byte in 3-clocks is too much modular and clock hungry. In this paper considering the RC4 algorithm, as it is, a simpler RC4 hardware design providing higher throughput is proposed in which 6 different architecture has been proposed. In design 1, 1-byte is processed in 1-clock, design 2 is a dynamic KSA-PRGA architecture of Design 1. Design 3 can process 2 byte in a single clock, where as Design 4 is Dynamic KSA-PRGA architecture of Design 3. Design 5 and Design 6 are parallelization architecture design 2 and design 4 which can compute 4 byte in a single clock. The maturity in terms of throughput, power consumption and resource usage, has been achieved from design 1 to design 6. The RC4 encryption and decryption designs are respectively embedded on two FPGA boards as co-processor hardware, the communication between the two boards performed using Ethernet.

preprint2014arXiv

Multi Core SSL/TLS Security Processor Architecture Prototype Design with automated Preferential Algorithm in FPGA

In this paper a pipelined architecture of a high speed network security processor (NSP) for SSL,TLS protocol is implemented on a system on chip (SOC) where hardware information of all encryption, hashing and key exchange algorithms are stored in flash memory in terms of bit files, in contrary to related works where all are actually implemented in hardware. The NSP finds applications in e-commerce, virtual private network (VPN) and in other fields that require data confidentiality. The motivation of the present work is to dynamically execute applications with stipulated throughput within budgeted hardware resource and power. A preferential algorithm choosing an appropriate cipher suite is proposed, which is based on Efficient System Index (ESI) budget comprising of power, throughput and resource given by the user. The bit files of the chosen security algorithms are downloaded from the flash memory to the partial region of field programmable gate array (FPGA). The proposed SOC controls data communication between an application running in a system through a PCI and the Ethernet interface of a network. Partial configuration feature is used in ISE14.4 suite with ZYNQ 7z020-clg484 FPGA platform. The performances

preprint2014arXiv

Performance Evaluation of ECC in Single and Multi Processor Architectures on FPGA Based Embedded System

Cryptographic algorithms are computationally costly and the challenge is more if we need to execute them in resource constrained embedded systems. Field Programmable Gate Arrays (FPGAs) having programmable logic de- vices and processing cores, have proven to be highly feasible implementation platforms for embedded systems providing lesser design time and reconfig- urability. Design parameters like throughput, resource utilization and power requirements are the key issues. The popular Elliptic Curve Cryptography (ECC), which is superior over other public-key crypto-systems like RSA in many ways, such as providing greater security for a smaller key size, is cho- sen in this work and the possibilities of its implementation in FPGA based embedded systems for both single and dual processor core architectures in- volving task parallelization have been explored. This exploration, which is first of its kind considering the other existing works, is a needed activity for evaluating the best possible architectural environment for ECC implementa- tion on FPGA (Virtex4 XC4VFX12, FF668, -10) based embedded platform.

preprint2012arXiv

A simple 1-byte 1-clock RC4 design and its efficient implementation in FPGA coprocessor for secured ethernet communication

In the field of cryptography till date the 1-byte in 1-clock is the best known RC4 hardware design [1], while the 1-byte in 3clocks is the best known implementation [2,3]. The design algorithm in [1] considers two consecutive bytes together and processes them in 2 clocks. The design of 1-byte in 3-clocks is too much modular and clock hungry. In this paper considering the RC4 algorithm, as it is, a simpler RC4 hardware design providing higher throughput is proposed in which 1-byte is processed in 1-clock. In the design two sequential tasks are executed as two independent events during rising and falling edges of the same clock and the swapping is directly executed using a MUX-DEMUX combination. The power consumed in behavioral and structural designs of RC4 are estimated and a power optimization technique is proposed. The NIST statistical test suite is run on RC4 key streams in order to know its randomness property. The encryption and decryption designs are respectively embedded on two FPGA boards with RC4 in a custom coprocessor followed by Ethernet communication.

preprint2012arXiv

A Synthesis Method for Quaternary Quantum Logic Circuits

Synthesis of quaternary quantum circuits involves basic quaternary gates and logic operations in the quaternary quantum domain. In this paper, we propose new projection operations and quaternary logic gates for synthesizing quaternary logic functions. We also demonstrate the realization of the proposed gates using basic quantum quaternary operations. We then employ our synthesis method to design of quaternary adder and some benchmark circuits. Our results in terms of circuit cost, are better than the existing works.

preprint2012arXiv

Architecture for real time continuous sorting on large width data volume for fpga based applications

In engineering applications sorting is an important and widely studied problem where execution speed and resources used for computation are of extreme importance, especially if we think about real time data processing. Most of the traditional sorting techniques compute the process after receiving all of the data and hence the process needs large amount of resources for data storage. So, suitable design strategy needs to be adopted if we wish to sort a large amount of data in real time, which essential means higher speed of process execution and utilization of fewer resources in most of the cases. This paper proposes a single chip scalable architecture based on Field Programmable Gate Array(FPGA), for a modified counting sort algorithm where data acquisition and sorting is being done in real time scenario. Our design promises to work efficiently, where data can be accepted in the run time scenario without any need of prior storage of data and also the execution speed of our algorithm is invariant to the length of the data stream. The proposed design is implemented and verified on Spartan 3E(XC3S500E-FG320) FPGA system. The results prove that our design is better in terms of some of the design parameters compared to the existing research works.

preprint2012arXiv

Cobb Angle Measurement of Scoliosis with Reduced Variability

Cobb angle, which is a measure of spinal curvature is the standard method for quantifying the magnitude of Scoliosis related to spinal deformity in orthopedics. Determining the Cobb angle through manual process is subject to human errors. In this work, we propose a methodology to measure the magnitude of Cobb angle, which appreciably reduces the variability related to its measurement compared to the related works. The proposed methodology is facilitated by using a suitable new improved version of Non-Local Means for image denoisation and Otsus automatic threshold selection for Canny edge detection. We have selected NLM for preprocessing of the image as it is one of the fine states of art for image denoisation and helps in retaining the image quality. Trimmedmean, median are more robust to outliners than mean and following this concept we observed that NLM denoising quality performance can be enhanced by using Euclidean trimmed-mean replacing the mean. To prove the better performance of the Non-Local Euclidean Trimmed-mean denoising filter, we have provided some comparative study results of the proposed denoising technique with traditional NLM and NonLocal Euclidean Medians. The experimental results for Cobb angle measurement over intra observer and inter observer experimental data reveals the better performance and superiority of the proposed approach compared to the related works. MATLAB2009b image processing toolbox was used for the purpose of simulation and verification of the proposed methodology.

preprint2012arXiv

Design and implementation of real time AES-128 on real time operating system for multiple FPGA communication

Security is the most important part in data communication system, where more randomization in secret keys increases the security as well as complexity of the cryptography algorithms. As a result in recent dates these algorithms are compensating with enormous memory spaces and large execution time on hardware platform. Field programmable gate arrays (FPGAs), provide one of the major alternative in hardware platform scenario due to its reconfiguration nature, low price and marketing speed. In FPGA based embedded system we can use embedded processor to execute particular algorithm with the inclusion of a real time operating System (RTOS), where threads may reduce resource utilization and time consumption. A process in the runtime is separated in different smaller tasks which are executed by the scheduler to meet the real time dead line using RTOS. In this paper we demonstrate the design and implementation of a 128-bit Advanced Encryption Standard (AES) both symmetric key encryption and decryption algorithm by developing suitable hardware and software design on Xilinx Spartan- 3E (XC3S500E-FG320) device using an Xilkernel RTOS, the implementation has been tested successfully The system is optimized in terms of execution speed and hardware utilization.

preprint2012arXiv

Linear Nearest Neighbor Synthesis of Reversible Circuits by Graph Partitioning

Linear Nearest Neighbor (LNN) synthesis in reversible circuits has emerged as an important issue in terms of technological implementation for quantum computation. The objective is to obtain a LNN architecture with minimum gate cost. As achieving optimal synthesis is a hard problem, heuristic methods have been proposed in recent literature. In this work we present a graph partitioning based approach for LNN synthesis with reduction in circuit cost. In particular, the number of SWAP gates required to convert a given gate-level quantum circuit to its equivalent LNN configuration is minimized. Our algorithm determines the reordering of indices of the qubit line(s) for both single control and multiple controlled gates. Experimental results for placing the target qubits of Multiple Controlled Toffoli (MCT) library of benchmark circuits show a significant reduction in gate count and quantum gate cost compared to those of related research works.

preprint2012arXiv

Outlier detection from ETL Execution trace

Extract, Transform, Load (ETL) is an integral part of Data Warehousing (DW) implementation. The commercial tools that are used for this purpose captures lot of execution trace in form of various log files with plethora of information. However there has been hardly any initiative where any proactive analyses have been done on the ETL logs to improve their efficiency. In this paper we utilize outlier detection technique to find the processes varying most from the group in terms of execution trace. As our experiment was carried on actual production processes, any outlier we would consider as a signal rather than a noise. To identify the input parameters for the outlier detection algorithm we employ a survey among developer community with varied mix of experience and expertise. We use simple text parsing to extract these features from the logs, as shortlisted from the survey. Subsequently we applied outlier detection technique (Clustering based) on the logs. By this process we reduced our domain of detailed analysis from 500 logs to 44 logs (8 Percentage). Among the 5 outlier cluster, 2 of them are genuine concern, while the other 3 figure out because of the huge number of rows involved.

preprint2012arXiv

Outlier Detection Techniques for SQL and ETL Tuning

RDBMS is the heart for both OLTP and OLAP types of applications. For both types of applications thousands of queries expressed in terms of SQL are executed on daily basis. All the commercial DBMS engines capture various attributes in system tables about these executed queries. These queries need to conform to best practices and need to be tuned to ensure optimal performance. While we use checklists, often tools to enforce the same, a black box technique on the queries for profiling, outlier detection is not employed for a summary level understanding. This is the motivation of the paper, as this not only points out to inefficiencies built in the system, but also has the potential to point evolving best practices and inappropriate usage. Certainly this can reduce latency in information flow and optimal utilization of hardware and software capacity. In this paper we start with formulating the problem. We explore four outlier detection techniques. We apply these techniques over rich corpora of production queries and analyze the results. We also explore benefit of an ensemble approach. We conclude with future courses of action. The same philosophy we have used for optimization of extraction, transform, load (ETL) jobs in one of our previous work. We give a brief introduction of the same in section four.

preprint2012arXiv

Quantum Ternary Circuit Synthesis Using Projection Operations

Basic logic gates and their operations in ternary quantum domain are involved in the synthesis of ternary quantum circuits. Only a few works define ternary algebra for ternary quantum logic realization. In this paper, a ternary logic function is expressed in terms of projection operations including a new one. A method to realize new multi-qutrit ternary gates in terms of generalized ternary gates and projection operations is also presented. We also introduced ten simplification rules for reducing ancilla qutrits and gate levels. Our method yields lower gate cost and fewer gate levels and ancilla qutrits than that obtained by earlier methods for the ternary benchmark circuits. The $n$ qutrit ternary sum function is synthesized without any ancilla qutrit by our proposed methodology.

preprint2012arXiv

Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clustering technique adds clear meaning to each of the clusters compared to K-means. We use DB Index to measure goodness of the clusters and show our method is comparable to EM (Expectation Maximization), PAM (Partition around Medoid) and K Means. We have explored its capability in detecting outlier and the benefit of added semantics. We discuss some of the limitations in its present form and also provide a rough direction in addressing the issue of merging the generated clusters.

preprint2010arXiv

Multiple View Reconstruction of Calibrated Images using Singular Value Decomposition

Calibration in a multi camera network has widely been studied for over several years starting from the earlier days of photogrammetry. Many authors have presented several calibration algorithms with their relative advantages and disadvantages. In a stereovision system, multiple view reconstruction is a challenging task. However, the total computational procedure in detail has not been presented before. Here in this work, we are dealing with the problem that, when a world coordinate point is fixed in space, image coordinates of that 3D point vary for different camera positions and orientations. In computer vision aspect, this situation is undesirable. That is, the system has to be designed in such a way that image coordinate of the world coordinate point will be fixed irrespective of the position & orientation of the cameras. We have done it in an elegant fashion. Firstly, camera parameters are calculated in its local coordinate system. Then, we use global coordinate data to transfer all local coordinate data of stereo cameras into same global coordinate system, so that we can register everything into this global coordinate system. After all the transformations, when the image coordinate of the world coordinate point is calculated, it gives same coordinate value for all camera positions & orientations. That is, the whole system is calibrated.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2406.16893:author:3:amlan-chakrabarti

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.12350:author:3:amlan-chakrabarti

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2604.25512:author:3:amlan-chakrabarti

Imported May 20, 2026Synced May 20, 2026

11 works

Rourab Paul

Researcher

Rourab Paul contributes to research discovery and scholarly infrastructure.

Open to collaborate

7 works

Amit Saha

Researcher

Amit Saha contributes to research discovery and scholarly infrastructure.

Open to collaborate

7 works

Ranjan Ghosh

Researcher

Ranjan Ghosh contributes to research discovery and scholarly infrastructure.

Open to collaborate

7 works

Suman Sau

Researcher

Suman Sau contributes to research discovery and scholarly infrastructure.

Open to collaborate

Amlan Chakrabarti

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

A New Technique for AI Explainability using Feature Association Map

PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification

From Transformers to LLMs: A Systematic Survey of Efficiency Considerations in NLP

Asymptotically Improved Circuit for $d$-ary Grover's Algorithm with Advanced Decomposition of $n$-qudit Toffoli Gate

Intermediate Qutrit-based Improved Quantum Arithmetic Operations with Application on Financial Derivative Pricing

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Circuit Design for $k$-coloring Problem and Its Implementation in Any Dimensional Quantum System

Circuit Design for Clique Problem and Its Implementation on Quantum Computer

Faster Search of Clustered Marked States with Lackadaisical Quantum Walks

Moving Quantum States without SWAP via Intermediate Higher Dimensional Qudits

Qurzon: A Prototype for a Divide and Conquer Based Quantum Compiler

2D Qubit Placement of Quantum Circuits using LONGPATH

The Blockchain Based Auditor on Secret key Life Cycle in Reconfigurable Platform

A review on application of data mining techniques to combat natural disasters

Accelerating More Secure RC4 : Implementation of Seven FPGA Designs in Stages upto 8 byte per clock

A Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital Images

A Novel Approach for Human Action Recognition from Silhouette Images

A Novel Method for Soft Error Mitigation in FPGA using Adaptive Cross Parity Code

A Novel Reconfigurable Hardware Design for Speech Enhancement Based on Multi-Band Spectral Subtraction Involving Magnitude and Phase Components

FPGA based High Speed Data Acquisition System for High Energy Physics Application

FPGA based Novel High Speed DAQ System Design with Error Correction

High speed fault tolerant secure communication for muon chamber using fpga based gbt emulator

An Approach for Reducing Outliers of Non Local Means Image Denoising Filter

Design space exploration for image processing architectures on FPGA targets

Fault Detection for RC4 Algorithm and its Implementation on FPGA Platform

Hardware Implementation of four byte per clock RC4 algorithm

Multi Core SSL/TLS Security Processor Architecture Prototype Design with automated Preferential Algorithm in FPGA

Performance Evaluation of ECC in Single and Multi Processor Architectures on FPGA Based Embedded System

A simple 1-byte 1-clock RC4 design and its efficient implementation in FPGA coprocessor for secured ethernet communication

A Synthesis Method for Quaternary Quantum Logic Circuits

Architecture for real time continuous sorting on large width data volume for fpga based applications

Cobb Angle Measurement of Scoliosis with Reduced Variability

Design and implementation of real time AES-128 on real time operating system for multiple FPGA communication

Linear Nearest Neighbor Synthesis of Reversible Circuits by Graph Partitioning

Outlier detection from ETL Execution trace

Outlier Detection Techniques for SQL and ETL Tuning

Quantum Ternary Circuit Synthesis Using Projection Operations

Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Multiple View Reconstruction of Calibrated Images using Singular Value Decomposition