Researcher profile

Chong Zhang

Chong Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Bridging Behavior and Semantics for Time-aware Cross-Domain Sequential Recommendation

Cross-domain sequential recommendation (CDSR) alleviates interaction sparsity by jointly modeling user behaviors across multiple domains. While current studies have made some progresses, they still neglect two issues that severely impact recommendation performance: (i) ignoring domain-specific interaction frequencies and interest decay rates at identical time intervals; (ii) treating semantic preferences as time-invariant during cross-domain transfer. To address these, we propose a novel framework that bridges Behavior and Semantics for Time-aware Cross-Domain Sequential Recommendation (BST-CDSR). Specifically, we design a behavioral preference evolution module that decouples long-term interests and short-term intentions, and models continuous-time preference via a neural ordinary differential equation (ODE) with event-driven updates. Additionally, to capture time-aware semantic preferences, we introduce a temporal counterfactual-enhanced semantic generator that discretizes temporal interval tokens and leverages large language models (LLMs) to extract robust temporal semantics, where counterfactual perturbations enhance the time sensitivity of semantic preferences. Furthermore, we propose a time-preference guided domain transfer module to adaptively control transfer weights and mitigate negative transfer. Extensive experiments on real-world datasets demonstrate that BST-CDSR consistently outperforms baselines.

preprint2026arXiv

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interactions. Existing memory systems often store isolated records and retrieve fragments, limiting their ability to consolidate evolving user states and resolve conflicts. We introduce EverMemOS, a self-organizing memory operating system that implements an engram-inspired lifecycle for computational memory. Episodic Trace Formation converts dialogue streams into MemCells that capture episodic traces, atomic facts, and time-bounded Foresight signals. Semantic Consolidation organizes MemCells into thematic MemScenes, distilling stable semantic structures and updating user profiles. Reconstructive Recollection performs MemScene-guided agentic retrieval to compose the necessary and sufficient context for downstream reasoning. Experiments on LoCoMo and LongMemEval show that EverMemOS achieves state-of-the-art performance on memory-augmented reasoning tasks. We further report a profile study on PersonaMem v2 and qualitative case studies illustrating chat-oriented capabilities such as user profiling and Foresight. Code is available at https://github.com/EverMind-AI/EverMemOS.

preprint2026arXiv

Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity

In high-order finite element analysis for elasticity, matrix-free (PA) methods are a key technology for overcoming the memory bottleneck of traditional Full Assembly (FA). However, existing implementations fail to fully exploit the special structure of modern CPU architectures and tensor-product elements, causing their performance "sweet spot" to anomalously remain at the low order of $p \approx 2$, which severely limits the potential of high-order methods. To address this challenge, we design and implement a highly optimized PA operator within the MFEM framework, deeply integrated with a Geometric Multigrid (GMG) preconditioner. Our multi-level optimization strategy includes replacing the original $O(p^6)$ generic algorithm with an efficient $O(p^4)$ one based on tensor factorization, exploiting Voigt symmetry to reduce redundant computations for the elasticity problem, and employing macro-kernel fusion to enhance data locality and break the memory bandwidth bottleneck. Extensive experiments on mainstream x86 and ARM architectures demonstrate that our method successfully shifts the performance "sweet spot" to the higher-order region of $p \ge 6$. Compared to the MFEM baseline, the optimized core operator (kernel) achieves speedups of 7x to 83x, which translates to a 3.6x to 16.8x end-to-end performance improvement in the complete solution process. This paper provides a validated and efficient practical path for conducting large-scale, high-order elasticity simulations on mainstream CPU hardware.

preprint2025arXiv

Dual prototype attentive graph network for cross-market recommendation

Cross-market recommender systems (CMRS) aim to utilize historical data from mature markets to promote multinational products in emerging markets. However, existing CMRS approaches often overlook the potential for shared preferences among users in different markets, focusing primarily on modeling specific preferences within each market. In this paper, we argue that incorporating both market-specific and market-shared insights can enhance the generalizability and robustness of CMRS. We propose a novel approach called Dual Prototype Attentive Graph Network for Cross-Market Recommendation (DGRE) to address this. DGRE leverages prototypes based on graph representation learning from both items and users to capture market-specific and market-shared insights. Specifically, DGRE incorporates market-shared prototypes by clustering users from various markets to identify behavioural similarities and create market-shared user profiles. Additionally, it constructs item-side prototypes by aggregating item features within each market, providing valuable market-specific insights. We conduct extensive experiments to validate the effectiveness of DGRE on a real-world cross-market dataset, and the results show that considering both market-specific and market-sharing aspects in modelling can improve the generalization and robustness of CMRS.

preprint2022arXiv

Accessibility-Based Clustering for Efficient Learning of Locomotion Skills

For model-free deep reinforcement learning of quadruped locomotion, the initialization of robot configurations is crucial for data efficiency and robustness. This work focuses on algorithmic improvements of data efficiency and robustness simultaneously through automatic discovery of initial states, which is achieved by our proposed K-Access algorithm based on accessibility metrics. Specifically, we formulated accessibility metrics to measure the difficulty of transitions between two arbitrary states, and proposed a novel K-Access algorithm for state-space clustering that automatically discovers the centroids of the static-pose clusters based on the accessibility metrics. By using the discovered centroidal static poses as the initial states, we can improve data efficiency by reducing redundant explorations, and enhance the robustness by more effective explorations from the centroids to sampled poses. Focusing on fall recovery as a very hard set of locomotion skills, we validated our method extensively using an 8-DoF quadrupedal robot Bittle. Compared to the baselines, the learning curve of our method converges much faster, requiring only 60% of training episodes. With our method, the robot can successfully recover to standing poses within 3 seconds in 99.4% of the test cases. Moreover, the method can generalize to other difficult skills successfully, such as backflipping.

preprint2022arXiv

Custom Sine Waves Are Enough for Imitation Learning of Bipedal Gaits with Different Styles

Not until recently, robust bipedal locomotion has been achieved through reinforcement learning. However, existing implementations rely heavily on insights and efforts from human experts, which is costly for the iterative design of robot systems. Also, styles of the learned motion are strictly limited to that of the reference. In this paper, we propose a new way to learn bipedal locomotion from a simple sine wave as the reference for foot heights. With the naive human insight that the two feet should be lifted up alternatively and periodically, we experimentally demonstrate on the Cassie robot that, a simple reward function is able to make the robot learn to walk end-to-end and efficiently without any explicit knowledge of the model. With custom sine waves, the learned gait pattern can also have customized styles. Codes are released at github.com/WooQi57/sin-cassie-rl.

preprint2022arXiv

Hierarchical information matters: Text classification via tree based graph neural network

Text classification is a primary task in natural language processing (NLP). Recently, graph neural networks (GNNs) have developed rapidly and been applied to text classification tasks. As a special kind of graph data, the tree has a simpler data structure and can provide rich hierarchical information for text classification. Inspired by the structural entropy, we construct the coding tree of the graph by minimizing the structural entropy and propose HINT, which aims to make full use of the hierarchical information contained in the text for the task of text classification. Specifically, we first establish a dependency parsing graph for each text. Then we designed a structural entropy minimization algorithm to decode the key information in the graph and convert each graph to its corresponding coding tree. Based on the hierarchical structure of the coding tree, the representation of the entire graph is obtained by updating the representation of non-leaf nodes in the coding tree layer by layer. Finally, we present the effectiveness of hierarchical information in text classification. Experimental results show that HINT outperforms the state-of-the-art methods on popular benchmarks while having a simple structure and few parameters.

preprint2022arXiv

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness.

preprint2022arXiv

Learning Ball-balancing Robot Through Deep Reinforcement Learning

The ball-balancing robot (ballbot) is a good platform to test the effectiveness of a balancing controller. Considering balancing control, conventional model-based feedback control methods have been widely used. However, contacts and collisions are difficult to model, and often lead to failure in balancing control, especially when the ballbot tilts a large angle. To explore the maximum initial tilting angle of the ballbot, the balancing control is interpreted as a recovery task using Reinforcement Learning (RL). RL is a powerful technique for systems that are difficult to model, because it allows an agent to learn policy by interacting with the environment. In this paper, by combining the conventional feedback controller with the RL method, a compound controller is proposed. We show the effectiveness of the compound controller by training an agent to successfully perform a recovery task involving contacts and collisions. Simulation results demonstrate that using the compound controller, the ballbot can keep balance under a larger set of initial tilting angles, compared to the conventional model-based controller.

preprint2022arXiv

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

preprint2021arXiv

DROID: Minimizing the Reality Gap using Single-Shot Human Demonstration

Reinforcement learning (RL) has demonstrated great success in the past several years. However, most of the scenarios focus on simulated environments. One of the main challenges of transferring the policy learned in a simulated environment to real world, is the discrepancy between the dynamics of the two environments. In prior works, Domain Randomization (DR) has been used to address the reality gap for both robotic locomotion and manipulation tasks. In this paper, we propose Domain Randomization Optimization IDentification (DROID), a novel framework to exploit single-shot human demonstration for identifying the simulator's distribution of dynamics parameters, and apply it to training a policy on a door opening task. Our results show that the proposed framework can identify the difference in dynamics between the simulated and the real worlds, and thus improve policy transfer by optimizing the simulator's randomization ranges. We further illustrate that based on these same identified parameters, our method can generalize the learned policy to different but related tasks.

preprint2020arXiv

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Despite the notable progress made in action recognition tasks, not much work has been done in action recognition specifically for human-robot interaction. In this paper, we deeply explore the characteristics of the action recognition task in interaction scenarios and propose an attention-oriented multi-level network framework to meet the need for real-time interaction. Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution firstly and then perform fine-grained pose estimation at high resolution. The other compact CNN receives the extracted skeleton sequence as input for action recognition, utilizing attention-like mechanisms to capture local spatial-temporal patterns and global semantic information effectively. To evaluate our approach, we construct a new action dataset specially for the recognition task in interaction scenarios. Experimental results on our dataset and high efficiency (112 fps at 640 x 480 RGBD) on the mobile computing platform (Nvidia Jetson AGX Xavier) demonstrate excellent applicability of our method on action recognition in real-time human-robot interaction.

preprint2020arXiv

Distinguished regular supercuspidal representations

Based on recent work of Kaletha, we apply Hakim--Murnaghan's result to study distinguished regular supercuspidal representations of tamely ramified reductive $p$-adic groups. Assuming $p$ is sufficiently large, we obtain a necessary and sufficient condition for regular supercuspidal representations to be distinguished. We also investigate the relation between the distinction problem and the Langlands functoriality, and confirm a conjecture of Lapid for regular depth-zero or epipelagic supercuspidal representations.

preprint2020arXiv

KGClean: An Embedding Powered Knowledge Graph Cleaning Framework

The quality assurance of the knowledge graph is a prerequisite for various knowledge-driven applications. We propose KGClean, a novel cleaning framework powered by knowledge graph embedding, to detect and repair the heterogeneous dirty data. In contrast to previous approaches that either focus on filling missing data or clean errors violated limited rules, KGClean enables (i) cleaning both missing data and other erroneous values, and (ii) mining potential rules automatically, which expands the coverage of error detecting. KGClean first learns data representations by TransGAT, an effective knowledge graph embedding model, which gathers the neighborhood information of each data and incorporates the interactions among data for casting data to continuous vector spaces with rich semantics. KGClean integrates an active learning-based classification model, which identifies errors with a small seed of labels. KGClean utilizes an efficient PRO-repair strategy to repair errors using a novel concept of propagation power. Extensive experiments on four typical knowledge graphs demonstrate the effectiveness of KGClean in practice.

preprint2020arXiv

Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation

In recognition-based action interaction, robots' responses to human actions are often pre-designed according to recognized categories and thus stiff. In this paper, we specify a new Interactive Action Translation (IAT) task which aims to learn end-to-end action interaction from unlabeled interactive pairs, removing explicit action recognition. To enable learning on small-scale data, we propose a Paired-Embedding (PE) method for effective and reliable data augmentation. Specifically, our method first utilizes paired relationships to cluster individual actions in an embedding space. Then two actions originally paired can be replaced with other actions in their respective neighborhood, assembling into new pairs. An Act2Act network based on conditional GAN follows to learn from augmented data. Besides, IAT-test and IAT-train scores are specifically proposed for evaluating methods on our task. Experimental results on two datasets show impressive effects and broad application prospects of our method.

preprint2020arXiv

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection

Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models. However, existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift. In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty. The core of SGA is to calculate "hardness" factors for sample pairs indicating domain distance in a kernel space. With the hardness factor, the proposed SGA adaptively indicates the importance of samples and assigns them different constrains. Indicated by hardness factors, Self-Guided Progressive Sampling (SPS) is implemented in an "easy-to-hard" way during model adaptation. Using multi-stage convolutional features, SGA is further aggregated to fully align hierarchical representations of detection models. Extensive experiments on commonly used benchmarks show that SGA improves the state-of-the-art methods with significant margins, while demonstrating the effectiveness on large domain shift.

preprint2020arXiv

Theta lifts and distinction for regular supercuspidal representations

This article has a twofold purpose. First, by recent works of Kaletha and Loke-Ma, we give an explicit description of the local theta correspondence between regular supercuspidal representations in the equal rank symplectic-orthogonal case. Second, based on this description, we show that the local theta correspondence preserves distinction with respect to unramified Galois involutions.