Researcher profile

Klara Nahrstedt

Klara Nahrstedt contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?

Inference time techniques such as decoding time scaling and self refinement have been shown to substantially improve mathematical reasoning in large language models (LLMs), largely attributed to emergent self correction and self verification behaviors often elicited through reinforcement learning (RL). In this work, we ask whether the same recipe transfers to vision language models (VLMs), especially RL finetuned variants that claim strong visual mathematical reasoning. Through extensive evaluation, we reach three main findings that differ markedly from text only models. First, generation time capability matters more than verification and refinement: simple majority voting consistently and substantially outperforms verification centric strategies such as best of N with self verification. Second, behaviors often associated with RL tuned models at inference time, such as the 'Aha moment,' do not yield reliable reasoning performance improvements. Third, visual information is not effectively integrated into the model's self verification process. Overall, our analysis highlights a key limitation: current RL trained VLMs derive limited benefit from self verification in the visual modality, which constrains the effectiveness of inference time scaling for visual mathematical reasoning.

preprint2026arXiv

LLM-ADAM: A Generalizable LLM Agent Framework for Pre-Print Anomaly Detection in Additive Manufacturing

Additive manufacturing (AM) continues to transform modern manufacturing by enabling flexible, on-demand production of complex geometries across diverse industries. Fused filament fabrication (FFF) has extended AM to laboratories, classrooms, and small production environments, but this accessibility shifts process-planning responsibility to users who may lack manufacturing expertise. A syntactically valid slicer profile can still encode thermally or geometrically harmful settings, and subtle G-code edits can alter extrusion, cooling, or adhesion before a print begins. Pre-print G-code screening catches accidental or adversarial machine-program errors before material or machine time is wasted. This paper proposes LLM-ADAM as a generalizable LLM framework for pre-print anomaly detection in AM. The framework decomposes the task into three roles: Extractor-LLM maps a G-code file to a structured process-parameter schema; Reference-LLM converts printer and material documentation into aligned operating ranges; and Judge-LLM interprets a deterministic deviation table and G-code evidence to decide whether a part is non-defective or belongs to an anomaly class. Printers, materials, and LLM backbones are interchangeable test conditions, not fixed assumptions. We evaluate the framework on an N=200 FFF G-code corpus spanning two desktop printer families, two materials, and five classes including non-defective, under-extrusion, over-extrusion, warping, and stringing. The best framework configuration reaches 87.5% accuracy, compared with 59.5% for the strongest engineered single-LLM baseline. The results show that structured decomposition, rather than backbone strength alone, is the dominant source of improvement, with defect classes identified at or near ceiling for leading configurations while residual errors concentrate on conservative false alarms for non-defective samples.

preprint2025arXiv

Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications

Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload. Once the prefill GPU completes its task, the KV cache must be transferred to the decode GPU. While existing works have proposed various KV cache transfer paths across different memory and storage tiers, there remains a lack of systematic benchmarking that compares their performance and energy efficiency. Meanwhile, although optimization techniques such as KV cache reuse and frequency scaling have been utilized for disaggregated serving, their performance and energy implications have not been rigorously benchmarked. In this paper, we fill this research gap by re-evaluating prefill-decode disaggregation under different KV transfer mediums and optimization strategies. Specifically, we include a new colocated serving baseline and evaluate disaggregated setups under different KV cache transfer paths. Through GPU profiling using dynamic voltage and frequency scaling (DVFS), we identify and compare the performance-energy Pareto frontiers across all setups to evaluate the potential energy savings enabled by disaggregation. Our results show that performance benefits from prefill-decode disaggregation are not guaranteed and depend on the request load and KV transfer mediums. In addition, stage-wise independent frequency scaling enabled by disaggregation does not lead to energy saving due to inherently higher energy consumption of disaggregated serving.

preprint2021arXiv

Robusta: Robust AutoML for Feature Selection via Reinforcement Learning

Several AutoML approaches have been proposed to automate the machine learning (ML) process, such as searching for the ML model architectures and hyper-parameters. However, these AutoML pipelines only focus on improving the learning accuracy of benign samples while ignoring the ML model robustness under adversarial attacks. As ML systems are increasingly being used in a variety of mission-critical applications, improving the robustness of ML systems has become of utmost importance. In this paper, we propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)--to perform feature selection, aiming to select features that lead to both accurate and robust ML systems. We show that a variation of the 0-1 robust loss can be directly optimized via an RL-based combinatorial search in the feature selection scenario. In addition, we employ heuristics to accelerate the search procedure based on feature scoring metrics, which are mutual information scores, tree-based classifiers feature importance scores, F scores, and Integrated Gradient (IG) scores, as well as their combinations. We conduct extensive experiments and show that the proposed framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples compared with other feature selection methods.

preprint2020arXiv

Nanotechnology-inspired Information Processing Systems of the Future

Nanoscale semiconductor technology has been a key enabler of the computing revolution. It has done so via advances in new materials and manufacturing processes that resulted in the size of the basic building block of computing systems - the logic switch and memory devices - being reduced into the nanoscale regime. Nanotechnology has provided increased computing functionality per unit volume, energy, and cost. In order for computing systems to continue to deliver substantial benefits for the foreseeable future to society at large, it is critical that the very notion of computing be examined in the light of nanoscale realities. In particular, one needs to ask what it means to compute when the very building block - the logic switch - no longer exhibits the level of determinism required by the von Neumann architecture. There needs to be a sustained and heavy investment in a nation-wide Vertically Integrated Semiconductor Ecosystem (VISE). VISE is a program in which research and development is conducted seamlessly across the entire compute stack - from applications, systems and algorithms, architectures, circuits and nanodevices, and materials. A nation-wide VISE provides clear strategic advantages in ensuring the US's global superiority in semiconductors. First, a VISE provides the highest quality seed-corn for nurturing transformative ideas that are critically needed today in order for nanotechnology-inspired computing to flourish. It does so by dramatically opening up new areas of semiconductor research that are inspired and driven by new application needs. Second, a VISE creates a very high barrier to entry from foreign competitors because it is extremely hard to establish, and even harder to duplicate.

preprint2020arXiv

Safety, Security, and Privacy Threats Posed by Accelerating Trends in the Internet of Things

The Internet of Things (IoT) is already transforming industries, cities, and homes. The economic value of this transformation across all industries is estimated to be trillions of dollars and the societal impact on energy efficiency, health, and productivity are enormous. Alongside potential benefits of interconnected smart devices comes increased risk and potential for abuse when embedding sensing and intelligence into every device. One of the core problems with the increasing number of IoT devices is the increased complexity that is required to operate them safely and securely. This increased complexity creates new safety, security, privacy, and usability challenges far beyond the difficult challenges individuals face just securing a single device. We highlight some of the negative trends that smart devices and collections of devices cause and we argue that issues related to security, physical safety, privacy, and usability are tightly interconnected and solutions that address all four simultaneously are needed. Tight safety and security standards for individual devices based on existing technology are needed. Likewise research that determines the best way for individuals to confidently manage collections of devices must guide the future deployments of such systems.

preprint2020arXiv

Serdab: An IoT Framework for Partitioning Neural Networks Computation across Multiple Enclaves

Recent advances in Deep Neural Networks (DNN) and Edge Computing have made it possible to automatically analyze streams of videos from home/security cameras over hierarchical clusters that include edge devices, close to the video source, as well as remote cloud compute resources. However, preserving the privacy and confidentiality of users' sensitive data as it passes through different devices remains a concern to most users. Private user data is subject to attacks by malicious attackers or misuse by internal administrators who may use the data in activities that are not explicitly approved by the user. To address this challenge, we present Serdab, a distributed orchestration framework for deploying deep neural network computation across multiple secure enclaves (e.g., Intel SGX). Secure enclaves provide a guarantee on the privacy of the data/code deployed inside it. However, their limited hardware resources make them inefficient when solely running an entire deep neural network. To bridge this gap, Serdab presents a DNN partitioning strategy to distribute the layers of the neural network across multiple enclave devices or across an enclave device and other hardware accelerators. Our partitioning strategy achieves up to 4.7x speedup compared to executing the entire neural network in one enclave.

preprint2020arXiv

SiEVE: Semantically Encoded Video Analytics on Edge and Cloud

Recent advances in computer vision and neural networks have made it possible for more surveillance videos to be automatically searched and analyzed by algorithms rather than humans. This happened in parallel with advances in edge computing where videos are analyzed over hierarchical clusters that contain edge devices, close to the video source. However, the current video analysis pipeline has several disadvantages when dealing with such advances. For example, video encoders have been designed for a long time to please human viewers and be agnostic of the downstream analysis task (e.g., object detection). Moreover, most of the video analytics systems leverage 2-tier architecture where the encoded video is sent to either a remote cloud or a private edge server but does not efficiently leverage both of them. In response to these advances, we present SIEVE, a 3-tier video analytics system to reduce the latency and increase the throughput of analytics over video streams. In SIEVE, we present a novel technique to detect objects in compressed video streams. We refer to this technique as semantic video encoding because it allows video encoders to be aware of the semantics of the downstream task (e.g., object detection). Our results show that by leveraging semantic video encoding, we achieve close to 100% object detection accuracy with decompressing only 3.5% of the video frames which results in more than 100x speedup compared to classical approaches that decompress every video frame.