Researcher profile

Changshui Zhang

Changshui Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation

Electrocardiogram (ECG)-based models have achieved strong performance in diagnostic tasks, yet they remain limited in modeling how cardiac dynamics evolve under external interventions. In particular, existing approaches focus primarily on static prediction and lack mechanisms to capture ECG variations under different pharmacological conditions. In this work, we propose an ECG World Model for action-conditioned predictive simulation of cardiac electrophysiology. Moving beyond disjoint pipelines, our framework features a principled integration of physiological ordinary differential equation (ODE) priors into latent diffusion dynamics via energy regularization. This structural constraint enables the synthesis of physiologically plausible post-intervention ECG trajectories while effectively mitigating generative hallucinations. Building on this simulation process, we introduce an uncertainty-aware evaluation strategy that leverages the stochasticity of diffusion sampling to characterize both the expected clinical risk and its variability, allowing a more reliable comparative assessment of candidate interventions. We evaluate our method across diverse settings, including controlled drug-response scenarios and real-world clinical records. Beyond standard waveform metrics, experimental results demonstrate improved risk calibration and strong alignment with expert-informed treatment preferences. These results establish our approach as a robust foundation for safe and intervention-aware clinical decision support.

preprint2022arXiv

A Machine Learning Method for Material Property Prediction: Example Polymer Compatibility

Prediction of material property is a key problem because of its significance to material design and screening. We present a brand-new and general machine learning method for material property prediction. As a representative example, polymer compatibility is chosen to demonstrate the effectiveness of our method. Specifically, we mine data from related literature to build a specific database and give a prediction based on the basic molecular structures of blending polymers and, as auxiliary, the blending composition. Our model obtains at least 75% accuracy on the dataset consisting of thousands of entries. We demonstrate that the relationship between structure and properties can be learned and simulated by machine learning method.

preprint2022arXiv

Collaboration Equilibrium in Federated Learning

Federated learning (FL) refers to the paradigm of learning models over a collaborative research network involving multiple clients without sacrificing privacy. Recently, there have been rising concerns on the distributional discrepancies across different clients, which could even cause counterproductive consequences when collaborating with others. While it is not necessarily that collaborating with all clients will achieve the best performance, in this paper, we study a rational collaboration called ``collaboration equilibrium'' (CE), where smaller collaboration coalitions are formed. Each client collaborates with certain members who maximally improve the model learning and isolates the others who make little contribution. We propose the concept of benefit graph which describes how each client can benefit from collaborating with other clients and advance a Pareto optimization approach to identify the optimal collaborators. Then we theoretically prove that we can reach a CE from the benefit graph through an iterative graph operation. Our framework provides a new way of setting up collaborations in a research network. Experiments on both synthetic and real world data sets are provided to demonstrate the effectiveness of our method.

preprint2022arXiv

GreedyNASv2: Greedier Search with a Greedy Path Filter

Training a good supernet in one-shot NAS methods is difficult since the search space is usually considerably huge (e.g., $13^{21}$). In order to enhance the supernet's evaluation ability, one greedy strategy is to sample good paths, and let the supernet lean towards the good ones and ease its evaluation burden as a result. However, in practice the search can be still quite inefficient since the identification of good paths is not accurate enough and sampled paths still scatter around the whole search space. In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently. Concretely, based on the fact that good paths are much less than the weak ones in the space, we argue that the label of "weak paths" will be more confident and reliable than that of "good paths" in multi-path sampling. In this way, we thus cast the training of path filter in the positive and unlabeled (PU) learning paradigm, and also encourage a \textit{path embedding} as better path/operation representation to enhance the identification capacity of the learned filter. By dint of this embedding, we can further shrink the search space by aggregating similar operations with similar embeddings, and the search can be more efficient and accurate. Extensive experiments validate the effectiveness of the proposed method GreedyNASv2. For example, our obtained GreedyNASv2-L achieves $81.1\%$ Top-1 accuracy on ImageNet dataset, significantly outperforming the ResNet-50 strong baselines.

preprint2022arXiv

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes the powerful self-attention mechanism of Transformer to learn discriminative features for each tracklet; (2) The decoder adopts the standard cross-attention mechanism to model the affinities between the tracklets and the detections by taking both spatial-temporal and appearance features into account. TransSTAM has two major advantages: (1) It is solely based on the encoder-decoder architecture and enjoys a compact network design, hence being computationally efficient; (2) It can effectively learn spatial-temporal and appearance features within one model, hence achieving better tracking accuracy. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA with respect to previous state-of-the-art approaches on all the benchmarks. Our code is available at \url{https://github.com/icicle4/TranSTAM}.

preprint2022arXiv

METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation

Knowing the reasoning chains from knowledge to the predicted answers can help construct an explainable question answering (QA) system. Advances on QA explanation propose to explain the answers with entailment trees composed of multiple entailment steps. While current work proposes to generate entailment trees with end-to-end generative models, the steps in the generated trees are not constrained and could be unreliable. In this paper, we propose METGEN, a Module-based Entailment Tree GENeration framework that has multiple modules and a reasoning controller. Given a question and several supporting knowledge, METGEN can iteratively generate the entailment tree by conducting single-step entailment with separate modules and selecting the reasoning flow with the controller. As each module is guided to perform a specific type of entailment reasoning, the steps generated by METGEN are more reliable and valid. Experiment results on the standard benchmark show that METGEN can outperform previous state-of-the-art models with only 9% of the parameters.

preprint2022arXiv

Rethinking Audio-visual Synchronization for Active Speaker Detection

Active speaker detection (ASD) systems are important modules for analyzing multi-talker conversations. They aim to detect which speakers or none are talking in a visual scene at any given time. Existing research on ASD does not agree on the definition of active speakers. We clarify the definition in this work and require synchronization between the audio and visual speaking activities. This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking. To address this problem, we propose a cross-modal contrastive learning strategy and apply positional encoding in attention modules for supervised ASD models to leverage the synchronization cue. Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models.

preprint2022arXiv

Searching for Network Width with Bilaterally Coupled Network

Searching for a more compact network width recently serves as an effective way of channel pruning for the deployment of convolutional neural networks (CNNs) under hardware constraints. To fulfill the searching, a one-shot supernet is usually leveraged to efficiently evaluate the performance \wrt~different network widths. However, current methods mainly follow a \textit{unilaterally augmented} (UA) principle for the evaluation of each width, which induces the training unfairness of channels in supernet. In this paper, we introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue. In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately. Besides, we propose to reduce the redundant search space and present the BCNetV2 as the enhanced supernet to ensure rigorous training fairness over channels. Furthermore, we leverage a stochastic complementary strategy for training the BCNet, and propose a prior initial population sampling method to boost the performance of the evolutionary search. We also propose the first open-source width benchmark on macro structures named Channel-Bench-Macro for the better comparison of width search algorithms. Extensive experiments on benchmark CIFAR-10 and ImageNet datasets indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Moreover, our method turns out to further boost the performance of NAS models by refining their network widths. For example, with the same FLOPs budget, our obtained EfficientNet-B0 achieves 77.53\% Top-1 accuracy on ImageNet dataset, surpassing the performance of original setting by 0.65\%.

preprint2022arXiv

Stretchable Cells Help DARTS Search Better

Differentiable neural architecture search (DARTS) has gained much success in discovering flexible and diverse cell types. To reduce the evaluation gap, the supernet is expected to have identical layers with the target network. However, even for this consistent search, the searched cells often suffer from poor performance, especially for the supernet with fewer layers, as current DARTS methods are prone to wide and shallow cells, and this topology collapse induces sub-optimal searched cells. In this paper, we alleviate this issue by endowing the cells with explicit stretchability, so the search can be directly implemented on our stretchable cells for both operation type and topology simultaneously. Concretely, we introduce a set of topological variables and a combinatorial probabilistic distribution to explicitly model the target topology. With more diverse and complex topologies, our method adapts well for various layer numbers. Extensive experiments on CIFAR-10 and ImageNet show that our stretchable cells obtain better performance with fewer layers and parameters. For example, our method can improve DARTS by 0.28\% accuracy on CIFAR-10 dataset with 45\% parameters reduced or 2.9\% with similar FLOPs on ImageNet dataset.

preprint2022arXiv

VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution

The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment. As a common linguistic phenomenon, pronouns are often used in dialogs to improve the communication efficiency. As a result, resolving pronouns (i.e., grounding pronouns to the noun phrases they refer to) is an essential step towards understanding dialogs. In this paper, we propose VD-PCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways. First, to implicitly help models understand pronouns, we design novel methods to perform the joint training of the pronoun coreference resolution and visual dialog tasks. Second, after observing that the coreference relationship of pronouns and their referents indicates the relevance between dialog rounds, we propose to explicitly prune the irrelevant history rounds in visual dialog models' input. With pruned input, the models can focus on relevant dialog history and ignore the distraction in the irrelevant one. With the proposed implicit and explicit methods, VD-PCR achieves state-of-the-art experimental results on the VisDial dataset.

preprint2021arXiv

Deep Likelihood Network for Image Restoration with Multiple Degradation Levels

Convolutional neural networks have been proven effective in a variety of image restoration tasks. Most state-of-the-art solutions, however, are trained using images with a single particular degradation level, and their performance deteriorates drastically when applied to other degradation settings. In this paper, we propose deep likelihood network (DL-Net), aiming at generalizing off-the-shelf image restoration networks to succeed over a spectrum of degradation levels. We slightly modify an off-the-shelf network by appending a simple recursive module, which is derived from a fidelity term, for disentangling the computation for multiple degradation levels. Extensive experimental results on image inpainting, interpolation, and super-resolution show the effectiveness of our DL-Net.

preprint2021arXiv

Locally Free Weight Sharing for Network Width Search

Searching for network width is an effective way to slim deep neural networks with hardware budgets. With this aim, a one-shot supernet is usually leveraged as a performance evaluator to rank the performance \wrt~different width. Nevertheless, current methods mainly follow a manually fixed weight sharing pattern, which is limited to distinguish the performance gap of different width. In this paper, to better evaluate each width, we propose a locally free weight sharing strategy (CafeNet) accordingly. In CafeNet, weights are more freely shared, and each width is jointly indicated by its base channels and free channels, where free channels are supposed to loCAte FrEely in a local zone to better represent each width. Besides, we propose to further reduce the search space by leveraging our introduced FLOPs-sensitive bins. As a result, our CafeNet can be trained stochastically and get optimized within a min-min strategy. Extensive experiments on ImageNet, CIFAR-10, CelebA and MS COCO dataset have verified our superiority comparing to other state-of-the-art baselines. For example, our method can further boost the benchmark NAS network EfficientNet-B0 by 0.41\% via searching its width more delicately.

preprint2020arXiv

Boosting Semantic Human Matting with Coarse Annotations

Semantic human matting aims to estimate the per-pixel opacity of the foreground human regions. It is quite challenging and usually requires user interactive trimaps and plenty of high quality annotated data. Annotating such kind of data is labor intensive and requires great skills beyond normal users, especially considering the very detailed hair part of humans. In contrast, coarse annotated human dataset is much easier to acquire and collect from the public dataset. In this paper, we propose to use coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input. Specifically, we train a mask prediction network to estimate the coarse semantic mask using the hybrid data, and then propose a quality unification network to unify the quality of the previous coarse mask outputs. A matting refinement network takes in the unified mask and the input image to predict the final alpha matte. The collected coarse annotated dataset enriches our dataset significantly, allows generating high quality alpha matte for real images. Experimental results show that the proposed method performs comparably against state-of-the-art methods. Moreover, the proposed method can be used for refining coarse annotated public dataset, as well as semantic segmentation methods, which reduces the cost of annotating high quality human data to a great extent.

preprint2020arXiv

Constructing Geographic and Long-term Temporal Graph for Traffic Forecasting

Traffic forecasting influences various intelligent transportation system (ITS) services and is of great significance for user experience as well as urban traffic control. It is challenging due to the fact that the road network contains complex and time-varying spatial-temporal dependencies. Recently, deep learning based methods have achieved promising results by adopting graph convolutional network (GCN) to extract the spatial correlations and recurrent neural network (RNN) to capture the temporal dependencies. However, the existing methods often construct the graph only based on road network connectivity, which limits the interaction between roads. In this work, we propose Geographic and Long term Temporal Graph Convolutional Recurrent Neural Network (GLT-GCRNN), a novel framework for traffic forecasting that learns the rich interactions between roads sharing similar geographic or longterm temporal patterns. Extensive experiments on a real-world traffic state dataset validate the effectiveness of our method by showing that GLT-GCRNN outperforms the state-of-the-art methods in terms of different metrics.

preprint2020arXiv

Few Sample Knowledge Distillation for Efficient Network Compression

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the requirement of a large training set and the time-consuming training procedure. This paper proposes a novel solution for knowledge distillation from label-free few samples to realize both data efficiency and training/processing efficiency. We treat the original network as "teacher-net" and the compressed network as "student-net". A 1x1 convolution layer is added at the end of each layer block of the student-net, and we fit the block-level outputs of the student-net to the teacher-net by estimating the parameters of the added layers. We prove that the added layer can be merged without adding extra parameters and computation cost during inference. Experiments on multiple datasets and network architectures verify the method's effectiveness on student-nets obtained by various network pruning and weight decomposition methods. Our method can recover student-net's accuracy to the same level as conventional fine-tuning methods in minutes while using only 1% label-free data of the full training data.

preprint2020arXiv

FMA-ETA: Estimating Travel Time Entirely Based on FFN With Attention

Estimated time of arrival (ETA) is one of the most important services in intelligent transportation systems and becomes a challenging spatial-temporal (ST) data mining task in recent years. Nowadays, deep learning based methods, specifically recurrent neural networks (RNN) based ones are adapted to model the ST patterns from massive data for ETA and become the state-of-the-art. However, RNN is suffering from slow training and inference speed, as its structure is unfriendly to parallel computing. To solve this problem, we propose a novel, brief and effective framework mainly based on feed-forward network (FFN) for ETA, FFN with Multi-factor self-Attention (FMA-ETA). The novel Multi-factor self-attention mechanism is proposed to deal with different category features and aggregate the information purposefully. Extensive experimental results on the real-world vehicle travel dataset show FMA-ETA is competitive with state-of-the-art methods in terms of the prediction accuracy with significantly better inference speed.

preprint2020arXiv

Fusion Recurrent Neural Network

Considering deep sequence learning for practical application, two representative RNNs - LSTM and GRU may come to mind first. Nevertheless, is there no chance for other RNNs? Will there be a better RNN in the future? In this work, we propose a novel, succinct and promising RNN - Fusion Recurrent Neural Network (Fusion RNN). Fusion RNN is composed of Fusion module and Transport module every time step. Fusion module realizes the multi-round fusion of the input and hidden state vector. Transport module which mainly refers to simple recurrent network calculate the hidden state and prepare to pass it to the next time step. Furthermore, in order to evaluate Fusion RNN's sequence feature extraction capability, we choose a representative data mining task for sequence data, estimated time of arrival (ETA) and present a novel model based on Fusion RNN. We contrast our method and other variants of RNN for ETA under massive vehicle travel data from DiDi Chuxing. The results demonstrate that for ETA, Fusion RNN is comparable to state-of-the-art LSTM and GRU which are more complicated than Fusion RNN.

preprint2020arXiv

GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet

Training a supernet matters for one-shot neural architecture search (NAS) methods since it serves as a basic performance estimator for different architectures (paths). Current methods mainly hold the assumption that a supernet should give a reasonable ranking over all paths. They thus treat all paths equally, and spare much effort to train paths. However, it is harsh for a single supernet to evaluate accurately on such a huge-scale search space (e.g., $7^{21}$). In this paper, instead of covering all paths, we ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data. Concretely, during training, we propose a multi-path sampling strategy with rejection, and greedily filter the weak paths. The training efficiency is thus boosted since the training space has been greedily shrunk from all paths to those potentially-good ones. Moreover, we further adopt an exploration and exploitation policy by introducing an empirical candidate path pool. Our proposed method GreedyNAS is easy-to-follow, and experimental results on ImageNet dataset indicate that it can achieve better Top-1 accuracy under same search space and FLOPs or latency level, but with only $\sim$60\% of supernet training cost. By searching on a larger space, our GreedyNAS can also obtain new state-of-the-art architectures.

preprint2020arXiv

On Connections between Regularizations for Improving DNN Robustness

This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs), from a theoretical point of view. Specifically, we study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional. We investigate them on DNNs with general rectified linear activations, which constitute one of the most prevalent families of models for image classification and a host of other machine learning applications. We shed light on essential ingredients of these regularizations and re-interpret their functionality. Through the lens of our study, more principled and efficient regularizations can possibly be invented in the near future.

preprint2020arXiv

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state). The key of this algorithm is the well-functioning reward model. Instead of defining it using music composition rules, we learn this model from monophonic and polyphonic training data. This model considers the compatibility of the machine-generated note with both the machine-generated context and the human-generated context. Experiments show that this algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part. Subjective evaluations on preferences show that the proposed algorithm generates music pieces of higher quality than the baseline method.

preprint2020arXiv

Road Network Metric Learning for Estimated Time of Arrival

Recently, deep learning have achieved promising results in Estimated Time of Arrival (ETA), which is considered as predicting the travel time from the origin to the destination along a given path. One of the key techniques is to use embedding vectors to represent the elements of road network, such as the links (road segments). However, the embedding suffers from the data sparsity problem that many links in the road network are traversed by too few floating cars even in large ride-hailing platforms like Uber and DiDi. Insufficient data makes the embedding vectors in an under-fitting status, which undermines the accuracy of ETA prediction. To address the data sparsity problem, we propose the Road Network Metric Learning framework for ETA (RNML-ETA). It consists of two components: (1) a main regression task to predict the travel time, and (2) an auxiliary metric learning task to improve the quality of link embedding vectors. We further propose the triangle loss, a novel loss function to improve the efficiency of metric learning. We validated the effectiveness of RNML-ETA on large scale real-world datasets, by showing that our method outperforms the state-of-the-art model and the promotion concentrates on the cold links with few data.