Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

A Gap Between Decision Trees and Neural Networks

We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation ($\mathrm{R}\mathrm{TV}$) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator $1_A$ has infinite $\mathrm{R}\mathrm{TV}$. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite $\mathrm{R}\mathrm{TV}$ in dimensions $d>1$, while Gaussian convolution yields finite $\mathrm{R}\mathrm{TV}$ but with an explicit exponential dependence on $d$. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to $1_A$). For classification, we construct a smooth barrier score $S_A$ with finite $\mathrm{R}\mathrm{TV}$ whose fixed threshold $τ=1$ exactly recovers the box. Under a mild tube-mass condition near $\partial A$, we prove an $L_1(P)$ calibration bound that decays polynomially in a sharpness parameter, along with an explicit $\mathrm{R}\mathrm{TV}$ upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.

preprint2026arXiv

Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

The design of embedded safety-critical systems such as those used in next-generation automotive and autonomous platforms, is increasingly challenged by escalating system complexity, hardware-software heterogeneity, and the integration of intelligent, data-driven components. Ensuring dependability in such systems requires a holistic approach that spans multiple abstraction layers and encompasses both design- and run-time assurance. Traditional methods for reliability, safety, and security management often fall short in addressing the dynamic and uncertain behaviors introduced by Artificial Intelligence (AI) and Machine Learning (ML) components, especially under stringent real-time, power, and safety constraints. While AI and ML offer powerful predictive, adaptive, and self-optimizing capabilities that can enhance system dependability, their inherent non-determinism, data-dependence, and lack of formal guarantees introduce new challenges for verification, validation, and certification. This paper explores emerging methodologies, architectures, and frameworks for designing dependable autonomous and embedded systems in the era of AI. It highlight advances in reliability modeling, secure system design, and certification approaches that account for imperfect, learning-enabled components, aiming to bridge the gap between AI innovation and certifiable system-level dependability.

preprint2026arXiv

Giant Damping-like Spin-Torque Conductivity in a GeTe/Py van der Waals Heterostructure

Recent observations of large unconventional spin-orbit torques in van der Waals (vdW) materials are driving intense interest for energy-efficient spintronic applications. A key limitation of ferromagnet (FM)/vdW heterostructures is their lower value of damping-like torque conductivity ($σ{\rm_{DL}^{y}}$) compared to the conventional heavy metal-based systems, limiting their prospects for commercial spintronic devices. Here, we report both a giant $σ{\rm_{DL}^{y}}$ of $-(1.25 \pm 0.11)\times 10^{5}~\hbar/ 2e~Ω^{-1}$m$^{-1}$ and an unconventional spin-orbit torque in a heterostructure comprising an FM (Ni$_{80}$Fe$_{20}$) and the vdW material GeTe. The value of $σ{\rm_{DL}^{y}}$ represents the highest reported torque conductivity for any FM/vdW interface and is comparable to benchmark heavy metal heterostructures. First-principles calculations reveal that this substantial torque originates from the cooperative interplay of the spin Hall effect, orbital Hall effect, and orbital Rashba effect, assisted by interfacial charge transfer. These findings demonstrate the potential of carefully engineered vdW heterostructures to achieve highly efficient electrical manipulation of magnetization at room temperature, paving the way for next-generation low-power spintronic devices.

preprint2026arXiv

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark

Existing benchmarks for Vision-Language Models (VLMs) primarily evaluate spatio-temporal understanding on simple single-action videos, closed attribute sets and restricted entity types, failing to capture the freeform, multi-action interactions between diverse entities which characterize real-world video understanding. Furthermore, the lack of a systematic framework for analyzing model failures across complementary spatio-temporal axes hinders comprehensive evaluation. To address these gaps, we introduce VISTA, a Video Interaction Spatio-Temporal Analysis benchmark designed for open-set, multi-entity and multi-action spatio-temporal understanding in VLMs. VISTA decomposes videos into interpretable entities, their associated actions, and relational dynamics, enabling multi-axis diagnostics and unified assessment of relational, spatial, and temporal understanding. Our benchmark integrates multiple datasets into a single interaction-aware taxonomy and comprises ~12K curated video-query pairs spanning diverse scenes and complexities. We systematically evaluate 11 state-of-the-art VLMs on VISTA, and break down aggregate performance across our taxonomy to reveal shortcomings and pronounced spatio-temporal biases obscured by traditional metrics. By providing detailed, taxonomy-driven diagnostics on a challenging dataset, VISTA offers a nuanced framework to guide advances in model design, pretraining strategies, and evaluation protocols. Overall, VISTA is the first, large-scale, interaction-aware diagnostic benchmark for spatio-temporal understanding in VLMs.

preprint2022arXiv

Combining Gradients and Probabilities for Heterogeneous Approximation of Neural Networks

This work explores the search for heterogeneous approximate multiplier configurations for neural networks that produce high accuracy and low energy consumption. We discuss the validity of additive Gaussian noise added to accurate neural network computations as a surrogate model for behavioral simulation of approximate multipliers. The continuous and differentiable properties of the solution space spanned by the additive Gaussian noise model are used as a heuristic that generates meaningful estimates of layer robustness without the need for combinatorial optimization techniques. Instead, the amount of noise injected into the accurate computations is learned during network training using backpropagation. A probabilistic model of the multiplier error is presented to bridge the gap between the domains; the model estimates the standard deviation of the approximate multiplier error, connecting solutions in the additive Gaussian noise space to actual hardware instances. Our experiments show that the combination of heterogeneous approximation and neural network retraining reduces the energy consumption for multiplications by 70% to 79% for different ResNet variants on the CIFAR-10 dataset with a Top-1 accuracy loss below one percentage point. For the more complex Tiny ImageNet task, our VGG16 model achieves a 53 % reduction in energy consumption with a drop in Top-5 accuracy of 0.5 percentage points. We further demonstrate that our error model can predict the parameters of an approximate multiplier in the context of the commonly used additive Gaussian noise (AGN) model with high accuracy. Our software implementation is available under https://github.com/etrommer/agn-approx.

preprint2022arXiv

End-to-End Semi-Supervised Learning for Video Action Detection

In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS which demonstrates its generalization capability The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach.

preprint2022arXiv

Exact recovery algorithm for Planted Bipartite Graph in Semi-random Graphs

The problem of finding the largest induced balanced bipartite subgraph in a given graph is NP-hard. This problem is closely related to the problem of finding the smallest Odd Cycle Transversal. In this work, we consider the following model of instances: starting with a set of vertices $V$, a set $S \subseteq V$ of $k$ vertices is chosen and an arbitrary $d$-regular bipartite graph is added on it; edges between pairs of vertices in $S \times (V \setminus S)$ and $(V \setminus S) \times (V \setminus S)$ are added with probability $p$. Since for $d=0$, the problem reduces to recovering a planted independent set, we don't expect efficient algorithms for $k=o(\sqrt{n})$. This problem is a generalization of the planted balanced biclique problem where the bipartite graph induced on $S$ is a complete bipartite graph; [Lev18] gave an algorithm for recovering $S$ in this problem when $k=Ω(\sqrt{n})$. Our main result is an efficient algorithm that recovers (w.h.p.) the planted bipartite graph when $k=Ω_p(\sqrt{n \log n})$ for a large range of parameters. Our results also hold for a natural semi-random model of instances, which involve the presence of a monotone adversary. Our proof shows that a natural SDP relaxation for the problem is integral by constructing an appropriate solution to it's dual formulation. Our main technical contribution is a new approach for constructing the dual solution where we calibrate the eigenvectors of the adjacency matrix to be the eigenvectors of the dual matrix. We believe that this approach may have applications to other recovery problems in semi-random models as well. When $k=Ω(\sqrt{n})$, we give an algorithm for recovering $S$ whose running time is exponential in the number of small eigenvalues in graph induced on $S$; this algorithm is based on subspace enumeration techniques due to the works of [KT07,ABS10,Kol11].

preprint2022arXiv

Large spin-to-charge conversion at the two-dimensional interface of transition metal dichalcogenides and permalloy

Spin-to-charge conversion is an essential requirement for the implementation of spintronic devices. Recently, monolayers of semiconducting transition metal dichalcogenides (TMDs) have attracted considerable interest for spin-to-charge conversion due to their high spin-orbit coupling and lack of inversion symmetry in their crystal structure. However, reports of direct measurement of spin-to-charge conversion at TMD-based interfaces are very much limited. Here, we report on the room temperature observation of a large spin-to-charge conversion arising from the interface of Ni$_{80}$Fe$_{20}$ (Py) and four distinct large area ($\sim 5\times2$~mm$^2$) monolayer (ML) TMDs namely, MoS$_2$, MoSe$_2$, WS$_2$, and WSe$_2$. We show that both spin mixing conductance and the Rashba efficiency parameter ($λ_{IREE}$) scales with the spin-orbit coupling strength of the ML TMD layers. The $λ_{IREE}$ parameter is found to range between $-0.54$ and $-0.76$ nm for the four monolayer TMDs, demonstrating a large spin-to-charge conversion. Our findings reveal that TMD/ferromagnet interface can be used for efficient generation and detection of spin current, opening new opportunities for novel spintronic devices.

preprint2022arXiv

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency

The rapid updates in error-resilient applications along with their quest for high throughput have motivated designing fast approximate functional units for Field-Programmable Gate Arrays (FPGAs). Studies that proposed imprecise functional techniques are posed with three shortcomings: first, most inexact multipliers and dividers are specialized for Application-Specific Integrated Circuit (ASIC) platforms. Second, state-of-the-art (SoA) approximate units are substituted, mostly in a single kernel of a multi-kernel application. Moreover, the end-to-end assessment is adopted on the Quality of Results (QoR), but not on the overall gained performance. Finally, existing imprecise components are not designed to support a pipelined approach, which could boost the operating frequency/throughput of, e.g., division-included applications. In this paper, we propose RAPID, the first pipelined approximate multiplier and divider architecture, customized for FPGAs. The proposed units efficiently utilize 6-input Look-up Tables (6-LUTs) and fast carry chains to implement Mitchell's approximate algorithms. Our novel error-refinement scheme not only has negligible overhead over the baseline Mitchell's approach but also boosts its accuracy to 99.4% for arbitrary size of multiplication and division. Experimental results demonstrate the efficiency of the proposed pipelined and non-pipelined RAPID multipliers and dividers over accurate counterparts. Moreover, the end-to-end evaluations of RAPID, deployed in three multi-kernel applications in the domains of bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) indicate up to 45% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over accurate kernels, with negligible loss in QoR.

preprint2022arXiv

Video Action Detection: Analysing Limitations and Challenges

Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.

preprint2021arXiv

Energy-efficient W$_{\text{100-x}}$Ta$_{\text{x}}$/CoFeB/MgO spin Hall nano-oscillators

We investigate a W-Ta alloying route to reduce the auto-oscillation current densities and the power consumption of nano-constriction based spin Hall nano oscillators. Using spin-torque ferromagnetic resonance (ST-FMR) measurements on microbars of W$_{\text{100-x}}$Ta$_{\text{x}}$(5 nm)/CoFeB(t)/MgO stacks with t = 1.4, 1.8, and 2.0 nm, we measure a substantial improvement in both the spin-orbit torque efficiency and the spin Hall conductivity. We demonstrate a 34\% reduction in threshold auto-oscillation current density, which translates into a 64\% reduction in power consumption as compared to pure W-based SHNOs. Our work demonstrates the promising aspects of W-Ta alloying for the energy-efficient operation of emerging spintronic devices.

preprint2021arXiv

Fabrication of voltage gated spin Hall nano-oscillators

We demonstrate an optimized fabrication process for electric field (voltage gate) controlled nano-constriction spin Hall nano-oscillators (SHNOs), achieving feature sizes of <30 nm with easy to handle ma-N 2401 e-beam lithography negative tone resist. For the nanoscopic voltage gates, we utilize a two-step tilted ion beam etching approach and through-hole encapsulation using 30 nm HfO<sub>x</sub>. The optimized tilted etching process reduces sidewalls by 75% compared to no tilting. Moreover, the HfO<sub>x</sub> encapsulation avoids any sidewall shunting and improves gate breakdown. Our experimental results on W/CoFeB/MgO/SiO<sub>2</sub> SHNOs show significant frequency tunability (6 MHz/V) even for moderate perpendicular magnetic anisotropy. Circular patterns with diameter of 45 nm are achieved with an aspect ratio better than 0.85 for 80% of the population. The optimized fabrication process allows incorporating a large number of individual gates to interface to SHNO arrays for unconventional computing and densely packed spintronic neural networks.

preprint2021arXiv

SeqL: Secure Scan-Locking for IP Protection

Existing logic-locking attacks are known to successfully decrypt functionally correct key of a locked combinational circuit. It is possible to extend these attacks to real-world Silicon-based Intellectual Properties (IPs, which are sequential circuits) through scan-chains by selectively initializing the combinational logic and analyzing the responses. In this paper, we propose SeqL, which achieves functional isolation and locks selective flip-flop functional-input/scan-output pairs, thus rendering the decrypted key functionally incorrect. We conduct a formal study of the scan-locking problem and demonstrate automating our proposed defense on any given IP. We show that SeqL hides functionally correct keys from the attacker, thereby increasing the likelihood of the decrypted key being functionally incorrect. When tested on pipelined combinational benchmarks (ISCAS,MCNC), sequential benchmarks (ITC) and a fully-fledged RISC-V CPU, SeqL gave 100% resilience to a broad range of state-of-the-art attacks including SAT[1], Double-DIP[2], HackTest[3], SMT[4], FALL[5], Shift-and-Leak[6] and Multi-cycle attacks[7].

preprint2021arXiv

The Teaching Dimension of Kernel Perceptron

Algorithmic machine teaching has been studied under the linear setting where exact teaching is possible. However, little is known for teaching nonlinear learners. Here, we establish the sample complexity of teaching, aka teaching dimension, for kernelized perceptrons for different families of feature maps. As a warm-up, we show that the teaching complexity is $Θ(d)$ for the exact teaching of linear perceptrons in $\mathbb{R}^d$, and $Θ(d^k)$ for kernel perceptron with a polynomial kernel of order $k$. Furthermore, under certain smooth assumptions on the data distribution, we establish a rigorous bound on the complexity for approximately teaching a Gaussian kernel perceptron. We provide numerical examples of the optimal (approximate) teaching set under several canonical settings for linear, polynomial and Gaussian kernel perceptrons.

preprint2021arXiv

Ultrathin ferrimagnetic GdFeCo films with very low damping

Ferromagnetic materials dominate as the magnetically active element in spintronic devices, but come with drawbacks such as large stray fields, and low operational frequencies. Compensated ferrimagnets provide an alternative as they combine the ultrafast magnetization dynamics of antiferromagnets with a ferromagnet-like spin-orbit-torque (SOT) behavior. However to use ferrimagnets in spintronic devices their advantageous properties must be retained also in ultrathin films (t < 10 nm). In this study, ferrimagnetic Gdx(Fe87.5Co12.5)1-x thin films in the thickness range t = 2-20 nm were grown on high resistance Si(100) substrates and studied using broadband ferromagnetic resonance measurements at room temperature. By tuning their stoichiometry, a nearly compensated behavior is observed in 2 nm Gdx(Fe87.5Co12.5)1-x ultrathin films for the first time, with an effective magnetization of Meff = 0.02 T and a low effective Gilbert damping constant of α = 0.0078, comparable to the lowest values reported so far in 30 nm films. These results show great promise for the development of ultrafast and energy efficient ferrimagnetic spintronic devices.

preprint2020arXiv

Detecting Deepfakes with Metric Learning

With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.

preprint2020arXiv

Direct measurement of interfacial Dzyaloshinskii-Moriya interaction at the MoS$_{\rm 2}$/Ni$_{80}$Fe$_{20}$ interface

We report on a direct measurement of sizable interfacial Dzyaloshinskii-Moriya interaction (iDMI) at the interface of two-dimensional transition metal dichalcogenide (2D-TMD), MoS$_{\rm 2}$ and Ni$_{80}$Fe$_{20}$ (Py) using Brillouin light scattering spectroscopy. A clear asymmetry in spin-wave dispersion is measured in MoS$_{\rm 2}$/Py/Ta, while no such asymmetry is detected in the reference Py/Ta system. A linear scaling of the DMI constant with the inverse of Py thickness indicates the interfacial origin of the observed DMI. We further observe an enhancement of DMI constant in three to four layer MoS$_{\rm 2}$/Py system (by 56$\%$) as compared to 2 layer MoS$_{\rm 2}$/Py which is caused by a higher density of MoO$_{\rm 3}$ defect species in the case of three to four layer MoS$_{\rm 2}$. The results open possibilities of spin-orbitronic applications utilizing the 2D-TMD based heterostructures.

preprint2020arXiv

Near Memory Acceleration on High Resolution Radio Astronomy Imaging

Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two-dimensional fast Fourier transform (2D FFT) is memory bound using CPI breakdown analysis on IBM Power9. Then, we present an NMC approach on FPGA for 2D FFT that outperforms a CPU by up to a factor of 120x and performs comparably to a high-end GPU, while using less bandwidth and memory.

preprint2020arXiv

Syn2Real: Forgery Classification via Unsupervised Domain Adaptation

In recent years, image manipulation is becoming increasingly more accessible, yielding more natural-looking images, owing to the modern tools in image processing and computer vision techniques. The task of the identification of forged images has become very challenging. Amongst different types of forgeries, the cases of Copy-Move forgery are increasing manifold, due to the difficulties involved to detect this tampering. To tackle such problems, publicly available datasets are insufficient. In this paper, we propose to create a synthetic forged dataset using deep semantic image inpainting and copy-move forgery algorithm. However, models trained on these datasets have a significant drop in performance when tested on more realistic data. To alleviate this problem, we use unsupervised domain adaptation networks to detect copy-move forgery in new domains by mapping the feature space from our synthetically generated dataset. Furthermore, we improvised the F1 score on CASIA and CoMoFoD dataset to 80.3% and 78.8%, respectively. Our approach can be helpful in those cases where the classification of data is unavailable.

preprint2016arXiv

Testing $k$-Monotonicity

A Boolean $k$-monotone function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical monotone functions, which are the $1$-monotone functions. Motivated by the recent interest in $k$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$-monotone (or are close to being $k$-monotone) from functions that are far from being $k$-monotone. Our results include the following: - We demonstrate a separation between testing $k$-monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; - We demonstrate a separation between testing and learning on $\{0,1\}^d$, for $k=ω(\log d)$: testing $k$-monotonicity can be performed with $2^{O(\sqrt d \cdot \log d\cdot \log{1/\varepsilon})}$ queries, while learning $k$-monotone functions requires $2^{Ω(k\cdot \sqrt d\cdot{1/\varepsilon})}$ queries (Blais et al. (RANDOM 2015)). - We present a tolerant test for functions $f\colon[n]^d\to \{0,1\}$ with complexity independent of $n$, which makes progress on a problem left open by Berman et al. (STOC 2014). Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid $[n]^d$, and draw connections to distribution testing techniques.

preprint2015arXiv

Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems

In this article, we propose a technique to accelerate nonvolatile or hybrid of volatile and nonvolatile processor cache design space exploration for application specific embedded systems. Utilizing a novel cache behavior modeling equation and a new accurate cache miss prediction mechanism, our proposed technique can accelerate NVM or hybrid FIFO processor cache design space exploration for SPEC CPU 2000 applications up to 249 times compared to the conventional approach.

preprint2015arXiv

TRISHUL: A Single-pass Optimal Two-level Inclusive Data Cache Hierarchy Selection Process for Real-time MPSoCs

Hitherto discovered approaches analyze the execution time of a real time application on all the possible cache hierarchy setups to find the application specific optimal two level inclusive data cache hierarchy to reduce cost, space and energy consumption while satisfying the time deadline in real time Multiprocessor Systems on Chip. These brute force like approaches can take years to complete. Alternatively, memory access trace driven crude estimation methods can find a cache hierarchy quickly by compromising the accuracy of results. In this article, for the first time, we propose a fast and accurate trace driven approach to find the optimal real time application specific two level inclusive data cache hierarchy. Our proposed approach TRISHUL predicts the optimal cache hierarchy performance first and then utilizes that information to find the optimal cache hierarchy quickly. TRISHUL can suggest a cache hierarchy, which has up to 128 times smaller size, up to 7 times faster compared to the suggestion of the state of the art crude trace driven two level inclusive cache hierarchy selection approach for the application traces analyzed.