Source author record

Yuxuan Zhang

Yuxuan Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

17works

21topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Discrete Shift and Polarization from Response to Symmetry Defects in Interacting Topological Phases

We extend the previous study of extracting crystalline symmetry-protected topological invariants to the correlated regime. We construct the interacting Hofstadter model defined on square lattice with the rotation and translation symmetry defects: disclination and dislocation. The model realizes Chern insulator and the charge density wave state as one tunes interactions. Employing the density matrix renormalization group (DMRG) method, we calculate the excess charge around the defects and find that the topological invariants remain quantized in both phases, with the topological quantity extracted to great precision. This study paves the way for utilizing matrix product state, and potentially other quantum many-body computation methods, to efficiently study crystalline symmetry defects on 2D interacting lattice systems.

preprint2026arXiv

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXiv

RewardHarness: Self-Evolving Agentic Post-Training

Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference annotation and additional model training. This creates a data-efficiency gap: humans can often infer the target evaluation criteria from only a few examples, while models are usually trained on hundreds of thousands of comparisons. We present RewardHarness, a self-evolving agentic reward framework that reframes reward modeling as context evolution rather than weight optimization. Instead of learning from large-scale annotations, RewardHarness aligns with human preferences by iteratively evolving a library of tools and skills from as few as 100 preference demonstrations. Given a source image, candidate edited images, and an editing instruction, an Orchestrator selects the most relevant subset of tools and skills from the maintained library, and a frozen Sub-Agent uses them to construct a reasoning chain that produces a preference judgment. By comparing predicted judgments with ground-truth preferences and analyzing successes and failures in the reasoning process, the Orchestrator automatically refines its library of tools and skills without additional human annotation. Using only 0.05% of the EditReward preference data, RewardHarness achieves 47.4% average accuracy on image-editing evaluation benchmarks, surpassing GPT-5 by 5.3 points. When used as a reward signal for GRPO fine-tuning, RL-tuned models achieve 3.52 on ImgEdit-Bench. Project page: https://rewardharness.com.

preprint2026arXiv

TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation

In recent years, Large Language Models (LLMs) have been widely studied in the code translation field on the method, class, and even repository levels. However, most of these benchmarks are limited in terms of Third-Party Library (TPL) categories and scales, making TPL-related errors hard to expose and hindering the development of targeted solutions. Considering the high dependence (over 90%) on TPLs in practical programming, demystifying and analyzing LLMs' code translation performance involving various TPLs becomes imperative. To address this gap, we construct TransLibEval, the first benchmark dedicated to library-centric code translation. It consists of 200 real-world tasks across Python, Java, and C++, each explicitly involving TPLs from diverse categories such as data processing, machine learning, and web development, with comprehensive dependency coverage and high-coverage test suites. We evaluate seven recent LLMs of commercial, general, and code-specialized families under six translation strategies of three categories: Direct, IR-guided, and Retrieval-augmented. Experimental results show a dramatic performance drop compared with library-free settings (average CA decline over 60%), while diverse strategies demonstrate heterogeneous advantages. Furthermore, we analyze 4,831 failed cases from GPT-4o, one of the State-of-the-Art (SOTA) LLMs, revealing numerous third-party reference errors that were obscured previously. These findings highlight the unique challenges of library-centric translation and provide practical guidance for improving TPL-aware code intelligence.

preprint2026arXiv

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.

preprint2022arXiv

All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines

Existing neural networks for computer vision tasks are vulnerable to adversarial attacks: adding imperceptible perturbations to the input images can fool these methods to make a false prediction on an image that was correctly predicted without the perturbation. Various defense methods have proposed image-to-image mapping methods, either including these perturbations in the training process or removing them in a preprocessing denoising step. In doing so, existing methods often ignore that the natural RGB images in today's datasets are not captured but, in fact, recovered from RAW color filter array captures that are subject to various degradations in the capture. In this work, we exploit this RAW data distribution as an empirical prior for adversarial defense. Specifically, we proposed a model-agnostic adversarial defensive method, which maps the input RGB images to Bayer RAW space and back to output RGB using a learned camera image signal processing (ISP) pipeline to eliminate potential adversarial patterns. The proposed method acts as an off-the-shelf preprocessing module and, unlike model-specific adversarial training methods, does not require adversarial images to train. As a result, the method generalizes to unseen tasks without additional retraining. Experiments on large-scale datasets (e.g., ImageNet, COCO) for different vision tasks (e.g., classification, semantic segmentation, object detection) validate that the method significantly outperforms existing methods across task domains.

preprint2022arXiv

Fractional disclination charge and discrete shift in the Hofstadter butterfly

In the presence of crystalline symmetries, topological phases of matter acquire a host of invariants leading to non-trivial quantized responses. Here we study a particular invariant, the discrete shift $\mathscr{S}$, for the square lattice Hofstadter model of free fermions. $\mathscr{S}$ is associated with a $\mathbb{Z}_M$ classification in the presence of $M$-fold rotational symmetry and charge conservation. $\mathscr{S}$ gives quantized contributions to (i) the fractional charge bound to a lattice disclination, and (ii) the angular momentum of the ground state with an additional, symmetrically inserted magnetic flux. $\mathscr{S}$ forms its own `Hofstadter butterfly', which we numerically compute, refining the usual phase diagram of the Hofstadter model. We propose an empirical formula for $\mathscr{S}$ in terms of density and flux per plaquette for the Hofstadter bands, and we derive a number of general constraints. We show that bands with the same Chern number may have different values of $\mathscr{S}$, although odd and even Chern number bands always have half-integer and integer values of $\mathscr{S}$ respectively.

preprint2022arXiv

Holographic simulation of correlated electrons on a trapped ion quantum processor

We develop holographic quantum simulation techniques to prepare correlated electronic ground states in quantum matrix product state (qMPS) form, using far fewer qubits than the number of orbitals represented. Our approach starts with a holographic technique to prepare a compressed approximation to electronic mean-field ground-states, known as fermionic Gaussian matrix product states (GMPS), with a polynomial reduction in qubit- and (in select cases gate-) resources compared to existing techniques. Correlations are then introduced by augmenting the GMPS circuits in a variational technique which we denote GMPS+X. We demonstrate this approach on Quantinuum's System Model H1 trapped-ion quantum processor for 1$d$ models of correlated metal and Mott insulating states. Focusing on the $1d$ Fermi-Hubbard chain as a benchmark, we show that GMPS+X methods faithfully capture the physics of correlated electron states, including Mott insulators and correlated Luttinger liquid metals, using considerably fewer parameters than problem-agnostic variational circuits.

preprint2022arXiv

Straddling-gates problem in multipartite quantum systems

We study a variant of quantum circuit complexity, the binding complexity: Consider a $n$-qubit system divided into two sets of $k_1$, $k_2$ qubits each ($k_1\leq k_2$) and gates within each set are free; what is the least cost of two-qubit gates ''straddling'' the sets for preparing an arbitrary quantum state, assuming no ancilla qubits allowed? Firstly, our work suggests that, without making assumptions on the entanglement spectrum, $Θ(2^{k_1})$ straddling gates always suffice. We then prove any $\text{U}(2^n)$ unitary synthesis can be accomplished with $Θ(4^{k_1})$ straddling gates. Furthermore, we extend our results to multipartite systems, and show that any $m$-partite Schmidt decomposable state has binding complexity linear in $m$, which hints its multi-separable property. This result not only resolves an open problem posed by Vijay Balasubramanian, who was initially motivated by the ''Complexity=Volume'' conjecture in quantum gravity, but also offers realistic applications in distributed quantum computation in the near future.

preprint2022arXiv

The clamped intensity of femtosecond laser pulses varying with gas pressure in the presence of external focusing

We perform a theoretical investigation of the clamped laser intensity inside the filament plasma as a function of gas pressure with external focusing. Unlike the clamped intensity under the selffocusing condition, which is independent on the gas pressure, the clamped intensity with external focusing decreases with the gas pressure. Our findings can explain the changes of the signals of femtosecond-laser-induced 391-nm forward emission and fluorescence with the nitrogen gas pressure.

preprint2022arXiv

The E-Bayesian Estimation and its E-MSE of Lomax distribution under different loss functions

This paper studies the E-Bayesian (expectation of the Bayesian estimation) estimation of the parameter of Lomax distribution based on different loss functions. Under different loss functions, we calculate the Bayesian estimation of the parameter and then calculate the expectation of the estimated value to get the E-Bayesian estimation. To measure the estimated error, the E-MSE (expected mean squared error) is introduced. And the formulas of E-Bayesian estimation and E-MSE are given. By applying Markov Chain Monte Carlo technology, we analyze the performances of the proposed methods. Results are compared on the basis of E-MSE. Then, cases of samples in real data sets are presented for illustration. In order to test whether the Lomax distribution can be used in analyzing the datasets, Kolmogorov Smirnov tests are conducted. Using real data, we can get the maximum likelihood estimation at the same time and compare it with E-Bayesian estimation. At last, we get the results of the comparison between Bayesian and E-Bayesian estimation methods under three different loss functions.

preprint2022arXiv

The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

Modern smartphones can continuously stream multi-megapixel RGB images at 60Hz, synchronized with high-quality 3D pose information and low-resolution LiDAR-driven depth estimates. During a snapshot photograph, the natural unsteadiness of the photographer's hands offers millimeter-scale variation in camera pose, which we can capture along with RGB and depth in a circular buffer. In this work we explore how, from a bundle of these measurements acquired during viewfinding, we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth to distill a high-fidelity depth map. We take a test-time optimization approach and train a coordinate MLP to output photometrically and geometrically consistent depth estimates at the continuous coordinates along the path traced by the photographer's natural hand shake. With no additional hardware, artificial hand motion, or user interaction beyond the press of a button, our proposed method brings high-resolution depth estimates to point-and-shoot "tabletop" photography -- textured objects at close range.

preprint2021arXiv

A prognostic dynamic model applicable to infectious diseases providing easily visualized guides -- A case study of COVID-19 in the UK

A reasonable prediction of infectious diseases transmission process under different disease control strategies is an important reference point for policy makers. Here we established a dynamic transmission model via Python and realized comprehensive regulation of disease control measures. We classified government interventions into three categories and introduced three parameters as descriptions for the key points in disease control, these being intraregional growth rate, interregional communication rate, and detection rate of infectors. Our simulation predicts the infection by COVID-19 in the UK would be out of control in 73 days without any interventions; at the same time, herd immunity acquisition will begin from the epicentre. After we introduced government interventions, single intervention is effective in disease control but at huge expense while combined interventions would be more efficient, among which, enhancing detection number is crucial in control strategy of COVID-19. In addition, we calculated requirements for the most effective vaccination strategy based on infection number in real situation. Our model was programmed with iterative algorithms, and visualized via cellular automata, it can be applied to similar epidemics in other regions if the basic parameters are inputted, and is able to synthetically mimick the effect of multiple factors in infectious disease control.

preprint2021arXiv

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

In Machine Learning as a Service, a provider trains a deep neural network and gives many users access. The hosted (source) model is susceptible to model stealing attacks, where an adversary derives a surrogate model from API access to the source model. For post hoc detection of such attacks, the provider needs a robust method to determine whether a suspect model is a surrogate of their model. We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs. These inputs are a subclass of transferable adversarial examples which we call conferrable adversarial examples that exclusively transfer with a target label from a source model to its surrogates. We propose a new method to generate these conferrable adversarial examples. We present an extensive study on the irremovability of our fingerprint against fine-tuning, weight pruning, retraining, retraining with different architectures, three model extraction attacks from related work, transfer learning, adversarial training, and two new adaptive attacks. Our fingerprint is robust against distillation, related model extraction attacks, and even transfer learning when the attacker has no access to the model provider's dataset. Our fingerprint is the first method that reaches a ROC AUC of 1.0 in verifying surrogates, compared to a ROC AUC of 0.63 by previous fingerprints.

preprint2020arXiv

A Combined Data-driven and Physics-driven Method for Steady Heat Conduction Prediction using Deep Convolutional Neural Networks

With several advantages and as an alternative to predict physics field, machine learning methods can be classified into two distinct types: data-driven relying on training data and physics-driven using physics law. Choosing heat conduction problem as an example, we compared the data- and physics-driven learning process with deep Convolutional Neural Networks (CNN). It shows that the convergences of the error to ground truth solution and the residual of heat conduction equation exhibit remarkable differences. Based on this observation, we propose a combined-driven method for learning acceleration and more accurate solutions. With a weighted loss function, reference data and physical equation are able to simultaneously drive the learning. Several numerical experiments are conducted to investigate the effectiveness of the combined method. For the data-driven based method, the introduction of physical equation not only is able to speed up the convergence, but also produces physically more consistent solutions. For the physics-driven based method, it is observed that the combined method is able to speed up the convergence up to 49.0\% by using a not very restrictive coarse reference.

preprint2020arXiv

A Sparse Learning Approach to the Detection of Multiple Noise-Like Jammers

In this paper, we address the problem of detecting multiple Noise-Like Jammers (NLJs) through a radar system equipped with an array of sensors. To this end, we develop an elegant and systematic framework wherein two architectures are devised to jointly detect an unknown number of NLJs and to estimate their respective angles of arrival. The followed approach relies on the likelihood ratio test in conjunction with a cyclic estimation procedure which incorporates at the design stage a sparsity promoting prior. As a matter of fact, the problem at hand owns an inherent sparse nature which is suitably exploited. This methodological choice is dictated by the fact that, from a mathematical point of view, classical maximum likelihood approach leads to intractable optimization problems (at least to the best of authors' knowledge) and, hence, a suboptimum approach represents a viable means to solve them. Performance analysis is conducted on simulated data and shows the effectiveness of the proposed architectures in drawing a reliable picture of the electromagnetic threats illuminating the radar system.

preprint2020arXiv

Adaptive Radar Detection and Classification Algorithms for Multiple Coherent Signals

In this paper, we address the problem of target detection in the presence of coherent (or fully correlated) signals, which can be due to multipath propagation effects or electronic attacks by smart jammers. To this end, we formulate the problem at hand as a multiple-hypothesis test that, besides the conventional radar alternative hypothesis, contains additional hypotheses accounting for the presence of an unknown number of interfering signals. In this context and leveraging the classification capabilities of the Model Order Selection rules, we devise penalized likelihood-ratio-based detection architectures that can establish, as a byproduct, which hypothesis is in force. Moreover, we propose a suboptimum procedure to estimate the angles of arrival of multiple coherent signals ensuring (at least for the considered parameters) almost the same performance as the exhaustive search. Finally, the performance assessment, conducted over simulated data and in comparison with conventional radar detectors, highlights that the proposed architectures can provide satisfactory performance in terms of probability of detection and correct classification.

Yuxuan Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Discrete Shift and Polarization from Response to Symmetry Defects in Interacting Topological Phases

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

RewardHarness: Self-Evolving Agentic Post-Training

TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines

Fractional disclination charge and discrete shift in the Hofstadter butterfly

Holographic simulation of correlated electrons on a trapped ion quantum processor

Straddling-gates problem in multipartite quantum systems

The clamped intensity of femtosecond laser pulses varying with gas pressure in the presence of external focusing

The E-Bayesian Estimation and its E-MSE of Lomax distribution under different loss functions

The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

A prognostic dynamic model applicable to infectious diseases providing easily visualized guides -- A case study of COVID-19 in the UK

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

A Combined Data-driven and Physics-driven Method for Steady Heat Conduction Prediction using Deep Convolutional Neural Networks

A Sparse Learning Approach to the Detection of Multiple Noise-Like Jammers

Adaptive Radar Detection and Classification Algorithms for Multiple Coherent Signals