Source author record

Zixuan Wang

Zixuan Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning Computation and Language math.NA Cryptography and Security eess.AS eess.IV Human-Computer Interaction physics.app-ph physics.flu-dyn physics.geo-ph quant-ph Robotics Sound

Catalog footprint

What is connected

18works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis

Deep Research (DR) agents, powered by advanced Large Language Models (LLMs), have recently garnered increasing attention for their capability in conducting complex research tasks. However, existing literature lacks a rigorous and systematic evaluation of DR Agent's capabilities in critical research analysis. To address this gap, we first propose HisRubric, a novel evaluation framework with a hierarchical analytical structure and a fine-grained grading rubric for rigorously assessing DR agents' capabilities in corporate financial analysis. This framework mirrors the professional analyst's workflow, progressing from data recognition to metric calculation, and finally to strategic summarization and interpretation. Built on this framework, we construct a FinDeepResearch benchmark that comprises 64 listed companies from 8 financial markets across 4 languages, encompassing a total of 15,808 grading items. We further conduct extensive experiments on the FinDeepResearch using 16 representative methods, including 6 DR agents, 5 LLMs equipped with both deep reasoning and search capabilities, and 5 LLMs with deep reasoning capabilities only. The results reveal the strengths and limitations of these approaches across diverse capabilities, financial markets, and languages, offering valuable insights for future research and development. The benchmark and evaluation code is publicly available at https://OpenFinArena.com/.

preprint2026arXiv

Is Data Shapley Not Better than Random in Data Selection? Ask NASH

Data selection studies the problem of identifying high-quality subsets of training data. While some existing works have considered selecting the subset of data with top-$m$ Data Shapley or other semivalues as they account for the interaction among every subset of data, other works argue that Data Shapley can sometimes perform ineffectively in practice and select subsets that are no better than random. This raises the questions: (I) Are there certain "Shapley-informative" settings where Data Shapley consistently works well? (II) Can we strategically utilize these settings to select high-quality subsets consistently and efficiently? In this paper, we propose a novel data selection framework, NASH (Non-linear Aggregation of SHapley-informative components), which (I) decomposes the target utility function (e.g., validation accuracy) into simpler, Shapley-informative component functions, and selects data by optimizing an objective that (II) aggregates these components non-linearly. We demonstrate that NASH substantially boosts the effectiveness of Shapley/semivalue-based data selection with minimal additional runtime cost.

preprint2026arXiv

Milestone-Guided Policy Learning for Long-Horizon Language Agents

While long-horizon agentic tasks require language agents to perform dozens of sequential decisions, training such agents with reinforcement learning remains challenging. We identify two root causes: credit misattribution, where correct early actions are penalized due to terminal failures, and sample inefficiency, where scarce successful trajectories result in near-total loss of learning signal. We introduce a milestone-guided policy learning framework, BEACON, that leverages the compositional structure of long-horizon tasks to ensure precise credit assignment. BEACON partitions trajectories at milestone boundaries, applies temporal reward shaping within segments to credit partial progress, and estimates advantages at dual scales to prevent distant failures from corrupting the evaluation of local actions. On ALFWorld, WebShop, and ScienceWorld, BEACON consistently outperforms GRPO and GiGPO. Notably, on long-horizon ALFWorld tasks, BEACON achieves 92.9% success rate, nearly doubling GRPO's 53.5%, while improving effective sample utilization from 23.7% to 82.0%. These results establish milestone-anchored credit assignment as an effective paradigm for training long-horizon language agents. Code is available at https://github.com/ZJU-REAL/BEACON.

preprint2026arXiv

When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

Trends on short-video platforms evolve at a rapid pace, with new content issues emerging every day that fall outside the coverage of existing annotation policies. However, traditional human-driven discovery of emerging issues is too slow, which leads to delayed updates of annotation policies and poses a major challenge for effective content governance. In this work, we propose an automatic issue discovery method based on multimodal LLM agents. Our approach automatically recalls short videos containing potential new issues and applies a two-stage clustering strategy to group them, with each cluster corresponding to a newly discovered issue. The agent then generates updated annotation policies from these clusters, thereby extending coverage to these emerging issues. Our agent has been deployed in the real system. Both offline and online experiments demonstrate that this agent-based method significantly improves the effectiveness of emerging-issue discovery (with an F1 score improvement of over 20%) and enhances the performance of subsequent issue governance (reducing the view count of problematic videos by approximately 15%). More importantly, compared to manual issue discovery, it greatly reduces time costs and substantially accelerates the iteration of annotation policies.

preprint2023arXiv

FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.

preprint2023arXiv

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.

preprint2022arXiv

A simple criterion and experiments for onset of flocculation in kaolin clay suspensions

Cohesive effects between fine-grained sediment particles greatly influence their effective settling rate and erodibility. Many studies have observed a qualitative difference in settling dynamics between clays in freshwater, where particles remain dispersed, and in saltwater, where aggregates form and settle rapidly. The critical coagulation concentration (CCC) of salt that separates the two regimes however remains under-investigated, even though knowledge of the CCC is crucial to understanding aggregation in settings such as estuaries, where large salt concentration gradients occur. Furthermore, no simple criterion exists to predict the CCC for clay suspensions. In this study, systematic experiments are performed to determine the CCC, by measuring transmitted light intensity through clay suspensions. To investigate the effect of ion valence, sodium chloride (NaCl) and calcium chloride (CaCl$_2$) are used. For kaolin clay, the results show a CCC of 0.6mM NaCl ($\approx 0.04$ppt NaCl $=$ 0.04 PSU), and of 0.04mM CaCl$_2$ ($\approx 0.004$ppt CaCl$_2$). Because these salinities are lower than those commonly observed in nature, these findings indicate that kaolin clay should flocculate in nearly all natural aquatic environments. Furthermore, due to the fact that tap water often has salinities higher than this threshold, these results imply that great care is needed in experiments, especially in large facilities where using distilled water is not feasible. In addition, a simple criterion to estimate the CCC for a kaolin clay suspension is derived. This criterion predicts that flocculation occurs at extremely low salt concentrations and is approximately independent of clay concentration, in agreement with the experimental observations and consistent with experimental evidence from the literature.

preprint2022arXiv

Explicit and implicit models in infrared and visible image fusion

Infrared and visible images, as multi-modal image pairs, show significant differences in the expression of the same scene. The image fusion task is faced with two problems: one is to maintain the unique features between different modalities, and the other is to maintain features at various levels like local and global features. This paper discusses the limitations of deep learning models in image fusion and the corresponding optimization strategies. Based on artificially designed structures and constraints, we divide models into explicit models, and implicit models that adaptively learn high-level features or can establish global pixel associations. Ten models for comparison experiments on 21 test sets were screened. The qualitative and quantitative results show that the implicit models have more comprehensive ability to learn image features. At the same time, the stability of them needs to be improved. Aiming at the advantages and limitations to be solved by existing algorithms, we discuss the main problems of multi-modal image fusion and future research directions.

preprint2022arXiv

Exploiting full Resolution Feature Context for Liver Tumor and Vessel Segmentation via Integrate Framework: Application to Liver Tumor and Vessel 3D Reconstruction under embedded microprocessor

Liver cancer is one of the most common malignant diseases in the world. Segmentation and labeling of liver tumors and blood vessels in CT images can provide convenience for doctors in liver tumor diagnosis and surgical intervention. In the past decades, many state-of-the-art medical image segmentation algorithms appeared during this period. With the development of embedded devices, embedded deployment for medical segmentation and automatic reconstruction brings prospects for future automated surgical tasks. Yet, most of the existing segmentation methods mostly care about the spatial feature context and have a perception defect in the semantic relevance of medical images, which significantly affects the segmentation accuracy of liver tumors and blood vessels. Deploying large and complex models into embedded devices requires a reasonable trade-off between model accuracy, reasoning speed and model capacity. Given these problems, we introduce a multi-scale feature fusion network called TransFusionNet based on Transformer. This network achieved very competitive performance for liver vessel and liver tumor segmentation tasks, meanwhile it can improve the recognition of morphologic margins of liver tumors by exploiting the global information of CT images. Experiments show that in vessel segmentation task TransFusionNet achieved mean Dice coefficients of 0.899 and in liver tumor segmentation task TransFusionNet achieved mean Dice coefficients of 0.961. Compared with the state-of-the-art framework, our model achieves the best segmentation result. In addition, we deployed the model into an embedded micro-structure and constructed an integrated model for liver tumor vascular segmentation and reconstruction. This proprietary structure will be the exclusive component of the future medical field.

preprint2022arXiv

Large Single-Phonon Optomechanical Coupling between Quantum Dots and Tightly Confined Surface Acoustic Waves in the Quantum Regime

Surface acoustic waves (SAWs) coupled to quantum dots (QDs), trapped atoms and ions, and point defects have been proposed as quantum transduction platforms, yet the requisite coupling rates and cavity lifetimes have not been experimentally established. Although the interaction mechanism varies, small acoustic cavities with large zero-point motion are required for high efficiencies. We experimentally establish the feasibility of this platform through electro- and opto-mechanical characterization of tightly focusing, single-mode Gaussian SAW cavities at $\sim$3.6 GHz on GaAs. We explore the performance limits of the platform by fabricating SAW cavities with mode volumes approaching 6$λ^3$ and linewidths $\leq$1 MHz. Employing strain-coupled single InAs QDs as optomechanical intermediaries, we measure single-phonon optomechanical coupling rates $g_0 \approx 2π\times 1.2$ MHz. Sideband scattering rates thus exceed intrinsic phonon loss, indicating the potential for quantum optical readout and transduction of cavity phonon states. To demonstrate the feasibility of this platform for low-noise ground-state quantum transduction, we develop a fiber-based confocal microscope in a dilution refrigerator and perform single-QD resonance fluorescence sideband spectroscopy at mK temperatures. These measurements show conversion between microwave phonons and optical photons with sub-natural linewidths.

preprint2022arXiv

Residual-guided Personalized Speech Synthesis based on Face Image

Previous works derive personalized speech features by training the model on a large dataset composed of his/her audio sounds. It was reported that face information has a strong link with the speech sound. Thus in this work, we innovatively extract personalized speech features from human faces to synthesize personalized speech using neural vocoder. A Face-based Residual Personalized Speech Synthesis Model (FR-PSS) containing a speech encoder, a speech synthesizer and a face encoder is designed for PSS. In this model, by designing two speech priors, a residual-guided strategy is introduced to guide the face feature to approach the true speech feature in the training. Moreover, considering the error of feature's absolute values and their directional bias, we formulate a novel tri-item loss function for face encoder. Experimental results show that the speech synthesized by our model is comparable to the personalized speech synthesized by training a large amount of audio data in previous works.

preprint2021arXiv

ByteSGAN: A Semi-supervised Generative Adversarial Network for Encrypted Traffic Classification of SDN Edge Gateway in Green Communication Network

With the rapid development of Green Communication Network, the types and quantity of network traffic data are accordingly increasing. Network traffic classification become a non-trivial research task in the area of network management and security, which not only help to improve the fine-grained network resource allocation, but also enable policy-driven network management. Meanwhile, the combination of SDN and Edge Computing can leverage both SDN at its global visiability of network-wide and Edge Computing at its low latency and good privacy-preserving. However, capturing large labeled datasets is a cumbersome and time-consuming manual labor. Semi-Supervised learning is an appropriate technique to overcome this problem. With that in mind, we proposed a Generative Adversarial Network (GAN)-based Semi-Supervised Learning Encrypted Traffic Classification method called \emph{ByteSGAN} embedded in SDN Edge Gateway to achieve the goal of traffic classification in a fine-grained manner to further improve network resource utilization. ByteSGAN can only use a small number of labeled traffic samples and a large number of unlabeled samples to achieve a good performance of traffic classification by modifying the structure and loss function of the regular GAN discriminator network in a semi-supervised learning way. Based on public dataset 'ISCX2012 VPN-nonVPN', two experimental results show that the ByteSGAN can efficiently improve the performance of traffic classifier and outperform the other supervised learning method like CNN.

preprint2021arXiv

Patterns for Representing Knowledge Graphs to Communicate Situational Knowledge of Service Robots

Service robots are envisioned to be adaptive to their working environment based on situational knowledge. Recent research focused on designing visual representation of knowledge graphs for expert users. However, how to generate an understandable interface for non-expert users remains to be explored. In this paper, we use knowledge graphs (KGs) as a common ground for knowledge exchange and develop a pattern library for designing KG interfaces for non-expert users. After identifying the types of robotic situational knowledge from the literature, we present a formative study in which participants used cards to communicate the knowledge for given scenarios. We iteratively coded the results and identified patterns for representing various types of situational knowledge. To derive design recommendations for applying the patterns, we prototyped a lab service robot and conducted Wizard-of-Oz testing. The patterns and recommendations could provide useful guidance in designing knowledge-exchange interfaces for robots.

preprint2015arXiv

Intelligent Health Recommendation System for Computer Users

The time people spend in front of computers has been increasing steadily due to the role computers play in modern society. Individuals who sit in front of computers for an extended period of time, specifically with improper postures may incur various health issues. In this work, individuals' behaviors in front of computers are studied using web cameras. By means of non-rigid face tracking system, data are analyzed to determine the 3D head pose, blink rate and yawn frequency of computer users. When combining these visual cues, a system of intelligent personal assistants for computer users is proposed.

preprint2015arXiv

Sparse Grid Discontinuous Galerkin Methods for High-Dimensional Elliptic Equations

This paper constitutes our initial effort in developing sparse grid discontinuous Galerkin (DG) methods for high-dimensional partial differential equations (PDEs). Over the past few decades, DG methods have gained popularity in many applications due to their distinctive features. However, they are often deemed too costly because of the large number of degrees of freedom of the approximation space, which are the main bottleneck for simulations in high dimensions. In this paper, we develop sparse grid DG methods for elliptic equations with the aim of breaking the \emph{curse of dimensionality}. Using a hierarchical basis representation, we construct a sparse finite element approximation space, reducing the degrees of freedom from the standard {$O(h^{-d})$ to $O(h^{-1}|\log_2 h|^{d-1})$} for $d$-dimensional problems, where $h$ is the uniform mesh size in each dimension. Our method, based on the interior penalty (IP) DG framework, can achieve accuracy of $O(h^{k}|\log_2 h|^{d-1})$ in the energy norm, where $k$ is the degree of polynomials used. Error estimates are provided and confirmed by numerical tests in multi-dimensions.

preprint2014arXiv

Geometric VLAD for Large Scale Image Search

We present a novel compact image descriptor for large scale image search. Our proposed descriptor - Geometric VLAD (gVLAD) is an extension of VLAD (Vector of Locally Aggregated Descriptors) that incorporates weak geometry information into the VLAD framework. The proposed geometry cues are derived as a membership function over keypoint angles which contain evident and informative information but yet often discarded. A principled technique for learning the membership function by clustering angles is also presented. Further, to address the overhead of iterative codebook training over real-time datasets, a novel codebook adaptation strategy is outlined. Finally, we demonstrate the efficacy of proposed gVLAD based retrieval framework where we achieve more than 15% improvement in mAP over existing benchmarks.

preprint2013arXiv

A New Discontinuous Galerkin Finite Element Method for Directly Solving the Hamilton-Jacobi Equations

In this paper, we improve upon the discontinuous Galerkin (DG) method for Hamilton-Jacobi (HJ) equation with convex Hamiltonians in (Y. Cheng and C.-W. Shu, J. Comput. Phys. 223:398-415,2007) and develop a new DG method for directly solving the general HJ equations. The new method avoids the reconstruction of the solution across elements by utilizing the Roe speed at the cell interface. Besides, we propose an entropy fix by adding penalty terms proportional to the jump of the normal derivative of the numerical solution. The particular form of the entropy fix was inspired by the Harten and Hyman's entropy fix (A. Harten and J. M. Hyman. J. Comput. Phys. 50(2):235-269, 1983) for Roe scheme for the conservation laws. The resulting scheme is compact, simple to implement even on unstructured meshes, and is demonstrated to work for nonconvex Hamiltonians. Benchmark numerical experiments in one dimension and two dimensions are provided to validate the performance of the method.

preprint2013arXiv

Who and Where: People and Location Co-Clustering

In this paper, we consider the clustering problem on images where each image contains patches in people and location domains. We exploit the correlation between people and location domains, and proposed a semi-supervised co-clustering algorithm to cluster images. Our algorithm updates the correlation links at the runtime, and produces clustering in both domains simultaneously. We conduct experiments in a manually collected dataset and a Flickr dataset. The result shows that the such correlation improves the clustering performance.

Zixuan Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis

Is Data Shapley Not Better than Random in Data Selection? Ask NASH

Milestone-Guided Policy Learning for Long-Horizon Language Agents

When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

A simple criterion and experiments for onset of flocculation in kaolin clay suspensions

Explicit and implicit models in infrared and visible image fusion

Exploiting full Resolution Feature Context for Liver Tumor and Vessel Segmentation via Integrate Framework: Application to Liver Tumor and Vessel 3D Reconstruction under embedded microprocessor

Large Single-Phonon Optomechanical Coupling between Quantum Dots and Tightly Confined Surface Acoustic Waves in the Quantum Regime

Residual-guided Personalized Speech Synthesis based on Face Image

ByteSGAN: A Semi-supervised Generative Adversarial Network for Encrypted Traffic Classification of SDN Edge Gateway in Green Communication Network

Patterns for Representing Knowledge Graphs to Communicate Situational Knowledge of Service Robots

Intelligent Health Recommendation System for Computer Users

Sparse Grid Discontinuous Galerkin Methods for High-Dimensional Elliptic Equations

Geometric VLAD for Large Scale Image Search

A New Discontinuous Galerkin Finite Element Method for Directly Solving the Hamilton-Jacobi Equations

Who and Where: People and Location Co-Clustering