Source author record

Junxian Li

Junxian Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NT Computer Vision Artificial Intelligence math.PR

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

$δ$-mem: Efficient Online Memory for Large Language Models

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $δ$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $δ$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $δ$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$δ$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

preprint2026arXiv

FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching

Recently, diffusion-based object removal models have achieved impressive results in eliminating objects and their associated visual effects. However, they indiscriminately denoise all tokens across all timesteps, ignoring that removal usually involves small foreground regions. This strategy introduces substantial computational overhead and prolonged inference times. To overcome this computational burden, we propose a latent discriminator to implement Region-aware Adversarial Distillation (RAD), yielding a highly efficient few-step model named FlashClear. Furthermore, tailored to few-step diffusion models, we propose FPAC (Foreground-Prioritized Asymmetric Attention and Caching), a training-free acceleration strategy. Extensive experiments demonstrate that our framework provides massive acceleration while maintaining or exceeding the performance of our base model, ObjectClear. Notably, on the OBER benchmark, our FlashClear achieves up to 8.26$\times$ and 122$\times$ speedup over ObjectClear and OmniPaint, respectively, while maintaining high visual quality and fidelity.

preprint2026arXiv

G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models

The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for MLLMs, existing methods typically rely on attention scores, text-image similarity and so on, implicitly assuming that the final objective is discriminative reasoning. This assumption does not hold for UMMs, where understanding-side visual tokens must also preserve the model's capabilities for editing images. We propose G$^2$TR, a generation-guided visual token reduction framework for separate-encoder UMMs. Our key insight is that the generation branch provides a task-agnostic signal for identifying understanding-side visual tokens that are not only semantically relevant but also important for latent-space image reconstruction and generation. G$^2$TR estimates token importance from consistency with VAE latent, performs balanced token selection, and merges redundant tokens into retained representatives to reduce information loss. The method is training-free, plug-and-play, and applied only after the understanding encoding stage, making it compatible with existing UMM inference pipelines. Experiments on image understanding and editing benchmarks show that G$^2$TR substantially reduces visual tokens and prefill computation by 1.94x while maintaining both reasoning accuracy and editing quality, outperforming baselines on almost all benchmarks. Code is at: https://github.com/lijunxian111/G2TR.

preprint2026arXiv

PermuQuant: Lowering Per-Group Quantization Error by Reordering Channels for Diffusion Models

Large-scale visual generative models have achieved remarkable performance. However, their high computational and memory costs make deployment challenging in resource-constrained scenarios, such as interactive applications and personal single-GPU usage. Post-training quantization (PTQ) offers a practical solution by compressing pretrained models without expensive retraining. However, existing PTQ methods still suffer from severe quality degradation under extremely low-bit settings. In this paper, we identify channel ordering as an important but underexplored factor in per-group quantization. In this setting, each contiguous group shares one quantization scale. When channels with very different statistics are placed in the same group, the scale can be dominated by outliers and cause large quantization errors. Based on this observation, we propose PermuQuant, a simple and effective PTQ framework for low-bit diffusion models. PermuQuant sorts channels by a joint second-moment criterion before per-group quantization, placing channels with similar activation and weight statistics into the same group. It further uses a calibration-based acceptance rule to apply reordering only when the selected permutation reduces quantization error on calibration data. The selected permutations are absorbed into adjacent modules or applied to weights offline, avoiding explicit runtime permutation operations. Extensive experiments on multiple large diffusion models show that PermuQuant consistently reduces quantization error and outperforms existing PTQ baselines. On FLUX.1-dev with an RTX 5090, PermuQuant achieves up to a 1.8$\times$ single step speedup and reduces the DiT memory footprint by 3.5$\times$ under W4A4 NVFP4 quantization. Code will be available at https://github.com/yscheng04/PermuQuant.

preprint2021arXiv

Joint value distribution of $L$-functions on the critical line

In this paper, we discuss the joint value distribution of $L$-functions in a suitable class. We obtain joint large deviations results in the central limit theorem for these $L$-functions and some mean value theorems, which give evidence that different $L$-functions are "statistically independent".

preprint2020arXiv

Uniform Titchmarsh divisor problems

Asymptotic formulae for Titchmarsh-type divisor sums are obtained with strong error terms that are uniform in the shift parameter. This applies to more general arithmetic functions such as sums of two squares, improving the error term in the representation of the number as a sum of a prime and two squares, and to Fourier coefficients of cusp forms, generalizing a result of Pitt.

preprint2019arXiv

The Surprising Accuracy of Benford's Law in Mathematics

Benford's law is an empirical ``law'' governing the frequency of leading digits in numerical data sets. Surprisingly, for mathematical sequences the predictions derived from it can be uncannily accurate. For example, among the first billion powers of $2$, exactly $301029995$ begin with digit 1, while the Benford prediction for this count is $10^9\log_{10}2=301029995.66\dots$. Similar ``perfect hits'' can be observed in other instances, such as the digit $1$ and $2$ counts for the first billion powers of $3$. We prove results that explain many, but not all, of these surprising accuracies, and we relate the observed behavior to classical results in Diophantine approximation as well as recent deep conjectures in this area.

preprint2016arXiv

A lower bound for the least prime in an arithmetic progression

Fix $k$ a positive integer, and let $\ell$ be coprime to $k$. Let $p(k,\ell)$ denote the smallest prime equivalent to $\ell \pmod{k}$, and set $P(k)$ to be the maximum of all the $p(k,\ell)$. We seek lower bounds for $P(k)$. In particular, we show that for almost every $k$ one has $P(k) \gg ϕ(k) \log k \log_2 k \log_4 k / \log_3 k,$ answering a question of Ford, Green, Konyangin, Maynard, and Tao. We rely on their recent work on large gaps between primes. Our main new idea is to use sieve weights to capture not only primes, but also small multiples of primes. We also give a heuristic which suggests that $\liminf_{k} \frac{P(k)}{ ϕ(k) \log^2 k} = 1.$

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint