Source author record

Changjian Wang

Changjian Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Artificial Intelligence cond-mat.mes-hall eess.AS eess.IV Machine Learning Multimedia physics.comp-ph Sound

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

Video understanding requires active evidence seeking, motivating tool-augmented video agents for temporal reasoning, cross-modal understanding, and complex question answering. Existing video agents have improved video reasoning with retrieval, memory, frame inspection, and verifier tools, but they still face two limitations: (1) a coarse tool space that lacks fine-grained operations for compositional reasoning; and (2) a flat action space that forces high-level video intents into primitive executable tool calls. In this paper, we address these challenges with two complementary designs. First, we construct a MetaAug-Video Tool Library (MVTL), an extensible tool library with 134 registered tools, including 26 base tools for general multimodal signal processing and 108 meta tools for filtering, aggregation, reranking, formatting, and other intermediate-result operations. MVTL supports dual-level access to both structured video information and raw modal evidence, enabling diverse video reasoning scenarios. Second, we propose ReTool-Video, a recursive tool-using method that grounds high-level video intents into executable tool chains. In ReTool-Video, matched actions are executed directly, while unmatched intents are delegated to a resolver for parameter repair, tool substitution, or decomposition. This allows abstract actions such as temporal merging, cross-modal verification, or repeated-event aggregation to be progressively translated into concrete multimodal operations at runtime. Experiments on MVBench, MLVU, and Video-MME w/o sub. show that ReTool-Video consistently outperforms strong baselines. Further analysis demonstrates that recursive grounding and fine-grained meta tools improve the stability and effectiveness of complex video understanding.

preprint2022arXiv

Trusted Multi-Scale Classification Framework for Whole Slide Image

Despite remarkable efforts been made, the classification of gigapixels whole-slide image (WSI) is severely restrained from either the constrained computing resources for the whole slides, or limited utilizing of the knowledge from different scales. Moreover, most of the previous attempts lacked of the ability of uncertainty estimation. Generally, the pathologists often jointly analyze WSI from the different magnifications. If the pathologists are uncertain by using single magnification, then they will change the magnification repeatedly to discover various features of the tissues. Motivated by the diagnose process of the pathologists, in this paper, we propose a trusted multi-scale classification framework for the WSI. Leveraging the Vision Transformer as the backbone for multi branches, our framework can jointly classification modeling, estimating the uncertainty of each magnification of a microscope and integrate the evidence from different magnification. Moreover, to exploit discriminative patches from WSIs and reduce the requirement for computation resources, we propose a novel patch selection schema using attention rollout and non-maximum suppression. To empirically investigate the effectiveness of our approach, empirical experiments are conducted on our WSI classification tasks, using two benchmark databases. The obtained results suggest that the trusted framework can significantly improve the WSI classification performance compared with the state-of-the-art methods.

preprint2022arXiv

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based on the natural correlation between audio clips and visual frames. However, this correlation might be weak or inaccurate in a large amount of real-world data, which leads to deviating positives into the contrastive paradigm. To address these issues, we propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives. On one hand, CMPC could learn the intra-class invariance by constructing semantic-wise positives via unsupervised clustering in different modalities. On the other hand, by comparing the similarities of cross-modal instances from that of cross-modal prototypes, we dynamically recalibrate the unlearnable instances' contribution to overall loss. Experiments show that the proposed approach outperforms state-of-the-art unsupervised methods on various voice-face association evaluation protocols. Additionally, in the low-shot supervision setting, our method also has a significant improvement compared to previous instance-wise contrastive learning.

preprint2022arXiv

Wound Segmentation with Dynamic Illumination Correction and Dual-view Semantic Fusion

Wound image segmentation is a critical component for the clinical diagnosis and in-time treatment of wounds. Recently, deep learning has become the mainstream methodology for wound image segmentation. However, the pre-processing of the wound image, such as the illumination correction, is required before the training phase as the performance can be greatly improved. The correction procedure and the training of deep models are independent of each other, which leads to sub-optimal segmentation performance as the fixed illumination correction may not be suitable for all images. To address aforementioned issues, an end-to-end dual-view segmentation approach was proposed in this paper, by incorporating a learn-able illumination correction module into the deep segmentation models. The parameters of the module can be learned and updated during the training stage automatically, while the dual-view fusion can fully employ the features from both the raw images and the enhanced ones. To demonstrate the effectiveness and robustness of the proposed framework, the extensive experiments are conducted on the benchmark datasets. The encouraging results suggest that our framework can significantly improve the segmentation performance, compared to the state-of-the-art methods.

preprint2020arXiv

Solute drag forces from equilibrium interface fluctuations

The design of polycrystalline alloys hinges on a predictive understanding of the interaction between the diffusing solutes and the motion of the constituent crystalline interfaces. Existing frameworks ignore the dynamic multiplicity of and transitions between the interfacial structures and phases. Here, we develop a computationally-accessible theoretical framework based on short-time equilibrium fluctuations to extract the drag force exerted by the segregating solute cloud. Using three distinct classes of computational techniques, we show that the random walk of a solute-loaded interface is necessarily non-classical at short time-scales as it occurs within a confining solute cloud. The much slower stochastic evolution of the cloud allows us to approximate the short-time behavior as an exponentially sub-diffusive Brownian motion in an external trapping potential with a stiffness set by the average drag force. At longer time-scales, the interfacial and bulk forces lead to a gradual recovery of classical random walk of the interface with a diffusivity set by the extrinsic mobility. The short-time response is accessible via {\it ab-initio} computations, offering a firm foundation for high throughput, rational design of alloys for controlling microstructural evolution in polycrystals, and in particular for nanocrystalline alloys-by-design.

preprint2016arXiv

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when data and its number of classes increased. However, some methods are with weak precision, and the others are time-consuming. In this paper, we propose an incremental learning method, namely, heterogeneous incremental Nearest Class Mean Random Forest (hi-RF), to handle this issue. It is a heterogeneous method that either replaces trees or updates trees leaves in the random forest adaptively, to reduce the computational time in comparable performance, when data of new classes arrive. Specifically, to keep the accuracy, one proportion of trees are replaced by new NCM decision trees; to reduce the computational load, the rest trees are updated their leaves probabilities only. Most of all, out-of-bag estimation and out-of-bag boosting are proposed to balance the accuracy and the computational efficiency. Fair experiments were conducted and demonstrated its comparable precision with much less computational time.

preprint2013arXiv

Effect of solute segregation on shear-induced grain boundary motion

Atomic-scale simulations are performed to study the effect of solute segregation on the shear-induced motion of select grain boundaries in the classical $α$-Fe/C system. At shear rates larger than the solute diffusion rate, we observe a transition from coupled motion to sliding. Below a critical solute excess, the boundaries break away from the solute cloud and move in a coupled motion. At smaller shear rates, we observe extrinsic coupled motion at small stresses indicating that the coupling is aided by convective solute diffusion along the boundary. Our studies underscore the role of solutes in modifying the bicrystallography, temperature and rate dependence of shear accommodation at grain boundaries.

Changjian Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

Trusted Multi-Scale Classification Framework for Whole Slide Image

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Wound Segmentation with Dynamic Illumination Correction and Dual-view Semantic Fusion

Solute drag forces from equilibrium interface fluctuations

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

Effect of solute segregation on shear-induced grain boundary motion