Source author record

Alireza Nadali

Alireza Nadali appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language eess.SP eess.SY Systems and Control

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values, and passes the enlarged cache forward; the same one-step update is applied repeatedly, analogous to foldl in functional programming. Building on the KV cache concatenation primitive introduced for latent multi-agent communication, we repurpose it as a chunk-to-chunk recurrence for long-context inference. When processing chunk t, the model attends to the KV cache carried from earlier chunks as a prefix, reusing its internal state across segments without modifying or retraining the model. Despite its simplicity, the induced recurrence is stable: per-step drift rises briefly and then saturates into a flat plateau that persists across deep chains. This plateau is insensitive to a 10,000x change in numerical precision, robust across chunk sizes, and consistent across model families. At the task level, KV-Fold preserves exact information over long distances. On a needle-in-a-haystack benchmark, it achieves 100% exact-match retrieval across 152 trials spanning contexts from 16K to 128K tokens and chain depths up to 511 on Llama-3.1-8B, while remaining within the memory limits of a single 40GB GPU. Compared to streaming methods, which trade fidelity for bounded memory, KV-Fold maintains long-range retrieval while operating as a sequence of tractable forward passes. Overall, our results show that frozen pretrained transformers already support a stable form of KV-cache recurrence, providing a practical route to long-context inference without architectural changes or training.

preprint2022arXiv

Maximum Entropy Dueling Network Architecture in Atari Domain

In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.

preprint2020arXiv

A Novel Method For Designing Transferable Soft Sensors And Its Application

In this paper, a new approach is proposed for designing transferable soft sensors. Soft sensing is one of the significant applications of data-driven methods in the condition monitoring of plants. While hard sensors can be easily used in various plants, soft sensors are confined to the specific plant they are designed for and cannot be used in a new plant or even used in some new working conditions in the same plant. In this paper, a solution is proposed for this underlying obstacle in data-driven condition monitoring systems. Data-driven methods suffer from the fact that the distribution of the data by which the models are constructed may not be the same as the distribution of the data to which the model will be applied. This ultimately leads to the decline of models accuracy. We proposed a new transfer learning (TL) based regression method, called Domain Adversarial Neural Network Regression (DANN-R), and employed it for designing transferable soft sensors. We used data collected from the SCADA system of an industrial power plant to comprehensively investigate the functionality of the proposed method. The result reveals that the proposed transferable soft sensor can successfully adapt to new plants.