Source author record

Chao Huang

Chao Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

53works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method

Recent progress in reasoning capabilities of Multimodal Large Language Models(MLLMs) has highlighted their potential for performing complex video understanding tasks. However, in the domain of Video Anomaly Detection and Understanding (VAD&U), existing MLLM-based methods are largely limited to anomaly localization or post-hoc description, lacking explicit reasoning processes, risk awareness, and decision-oriented interpretation. To address this gap, we define a new task termed Video Anomaly Reasoning (VAR), which elevates video anomaly analysis from descriptive understanding to structured, multi-stage reasoning. VAR explicitly requires models to perform progressive reasoning over anomalous events before answering anomaly-related questions, encompassing visual perception, causal interpretation, and risk-aware decision making. To support this task, we present a new dataset with 8,641 videos, where each video is annotated with diverse question types corresponding to different reasoning depths, totaling more than 50,000 samples, making it one of the largest datasets for video anomaly. The annotations are based on a structured Perception-Cognition-Action Chain-of-Thought (PerCoAct-CoT), which formalizes domain-specific reasoning priors for video anomaly understanding. This design enables systematic evaluation of multi-stage and adaptive anomaly reasoning. In addition, we propose Anomaly-Aware Group Relative Policy Optimization to further enhance reasoning reliability under weak supervision. Building upon the proposed task and dataset, we develop an end-to-end MLLM-based VAR model termed Vad-R1-Plus, which supports adaptive hierarchical reasoning and risk-aware decision making. Extensive experiments demonstrate that the proposed benchmark and method effectively advance the reasoning capabilities of MLLMs on VAR tasks, outperforming both open-source and proprietary baselines.

preprint2026arXiv

Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational costs and lack of interpretability. To bridge this gap, we propose One-Step-Train (OST), a framework that reformulates data selection as an incremental optimization utility ranking problem. Instead of relying on semantic heuristics, OST estimates the marginal utility of each sample via a simulated single-step update on a lightweight proxy. Experiments on the Qwen series across multimodal mathematical reasoning benchmarks demonstrate that OST achieves Pareto-optimal efficiency. By selecting the top-50 subset, OST reduces training costs by 43% (and total time consumption by 17) while surpassing the strong LLM-as-a-Judge baseline by 1.8 points. Furthermore, under a fixed compute budget, our method using only the top-20 subset achieves a 5.6 point gain over LLM-as-a-Judge, improves upon heuristic scoring baselines like DEITA, and outperforms the Full-SFT baseline by 8.8 points. Notably, while Full-SFT suffers from performance degradation due to noise, our optimization-grounded approach effectively identifies toxic samples, successfully reversing the negative transfer frequently observed in complex reasoning tasks.

preprint2026arXiv

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference

Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure that the LLMs can fit into the device memory. Recent pipeline-based approaches have the potential to parallelize communication and computation, which helps reduce inference latency. However, the benefit diminishes when the inference request at the network edge is sparse, where pipeline is typically at low utilization. To enable efficient distributed LLM inference at the edge, we propose \textbf{FlowSpec}, a pipeline-parallel tree-based speculative decoding framework. FlowSpec incorporates three key mechanisms to improve decoding efficiency: 1) score-based step-wise verification prioritizes more important draft tokens to bring earlier accepted tokens; 2) efficient draft management to prune invalid tokens while maintaining correct causal relationship during verification; 3) dynamic draft expansion strategies to supply high-quality speculative inputs. These techniques work in concert to enhance both pipeline utilization and speculative efficiency. We evaluate FlowSpec on a real-world testbed with other baselines. Experimental results demonstrate that our proposed framework significantly improves inference speed across diverse models and configurations, achieving speedup ratios 1.37$\times$-1.73$\times$ compared to baselines. Our code is publicly available at \href{https://github.com/Leosang-lx/FlowSpec#}{https://github.com/Leosang-lx/FlowSpec\#}.

preprint2026arXiv

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelligence. Safactory integrates three tightly coupled platforms: a \textbf{Parallel Simulation Platform} for trajectory generation, a \textbf{Trustworthy Data Platform} for trajectory storage and experience extraction, and an \textbf{Autonomous Evolution Platform} for asynchronous reinforcement learning and on-policy distillation. As far as we know, Safactory is the first framework to propose a unified evolutionary pipeline for next-generation trustworthy autonomous intelligence.

preprint2026arXiv

Semantic visually-guided acoustic highlighting with large vision-language models

Balancing dialogue, music, and sound effects with accompanying video is crucial for immersive storytelling, yet current audio mixing workflows remain largely manual and labor-intensive. While recent advancements have introduced the visually guided acoustic highlighting task, which implicitly rebalances audio sources using multimodal guidance, it remains unclear which visual aspects are most effective as conditioning signals.We address this gap through a systematic study of whether deep video understanding improves audio remixing. Using textual descriptions as a proxy for visual analysis, we prompt large vision-language models to extract six types of visual-semantic aspects, including object and character appearance, emotion, camera focus, tone, scene background, and inferred sound-related cues. Through extensive experiments, camera focus, tone, and scene background consistently yield the largest improvements in perceptual mix quality over state-of-the-art baselines. Our findings (i) identify which visual-semantic cues most strongly support coherent and visually aligned audio remixing, and (ii) outline a practical path toward automating cinema-grade sound design using lightweight guidance derived from large vision-language models.

preprint2026arXiv

SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere

Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes. We observe that intermediate-layer features of pre-trained multimodal large language models (MLLMs) already encode rich anomaly semantics, yet existing approaches rely on the language output pathway and fail to exploit the geometric discriminability latent in these representations. Based on this finding, we propose SphereVAD, a fully training-free, zero-shot VAD framework that recasts anomaly discrimination as von Mises-Fisher (vMF) likelihood-ratio geodesic inference on the unit hypersphere, unleashing latent discriminability through principled geometric reasoning rather than learning new representations. Specifically, SphereVAD first applies Frechet mean centering to unfold feature distributions and eliminate domain biases, then employs Holistic Scene Attention (HSA) to reinforce feature consistency using cross-video priors, and finally performs vMF-guided Spherical Geodesic Pulling (SGP) to align ambiguous segments with directional prototypes on the spherical manifold. This training-free pipeline requires only minimal synthetic images for calibration. SphereVAD establishes new state-of-the-art results among training-free approaches on three major benchmarks and remains competitive with fully supervised baselines. Code will be available upon acceptance.

preprint2024arXiv

LLMRec: Large Language Models with Graph Augmentation for Recommendation

The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) enhancing the understanding of item node attributes, and (iii) conducting user node profiling, intuitively from the natural language perspective. By employing these strategies, we address the challenges posed by sparse implicit feedback and low-quality side information in recommenders. Besides, to ensure the quality of the augmentation, we develop a denoised data robustification mechanism that includes techniques of noisy implicit feedback pruning and MAE-based feature enhancement that help refine the augmented data and improve its reliability. Furthermore, we provide theoretical analysis to support the effectiveness of LLMRec and clarify the benefits of our method in facilitating model optimization. Experimental results on benchmark datasets demonstrate the superiority of our LLM-based augmentation approach over state-of-the-art techniques. To ensure reproducibility, we have made our code and augmented data publicly available at: https://github.com/HKUDS/LLMRec.git

preprint2022arXiv

A Tool for Neural Network Global Robustness Certification and Training

With the increment of interest in leveraging machine learning technology in safety-critical systems, the robustness of neural networks under external disturbance receives more and more concerns. Global robustness is a robustness property defined on the entire input domain. And a certified globally robust network can ensure its robustness on any possible network input. However, the state-of-the-art global robustness certification algorithm can only certify networks with at most several thousand neurons. In this paper, we propose the GPU-supported global robustness certification framework GROCET, which is more efficient than the previous optimization-based certification approach. Moreover, GROCET provides differentiable global robustness, which is leveraged in the training of globally robust neural networks.

preprint2022arXiv

Atomic Filter: a Weak Form of Shift Operator for Graph Signals

The shift operation plays a crucial role in the classical signal processing. It is the generator of all the filters and the basic operation for time-frequency analysis, such as windowed Fourier transform and wavelet transform. With the rapid development of internet technology and big data science, a large amount of data are expressed as signals defined on graphs. In order to establish the theory of filtering, windowed Fourier transform and wavelet transform in the setting of graph signals, we need to extend the shift operation of classical signals to graph signals. It is a fundamental problem since the vertex set of a graph is usually not a vector space and the addition operation cannot be defined on the vertex set of the graph. In this paper, based on our understanding on the core role of shift operation in classical signal processing we propose the concept of atomic filters, which can be viewed as a weak form of the shift operator for graph signals. Then, we study the conditions such that an atomic filter is norm-preserving, periodic, or real-preserving. The property of real-preserving holds naturally in the classical signal processing, but no the research has been reported on this topic in the graph signal setting. With these conditions we propose the concept of normal atomic filters for graph signals, which degenerates into the classical shift operator under mild conditions if the graph is circulant. Typical examples of graphs that have or have not normal atomic filters are given. Finally, as an application, atomic filters are utilized to construct time-frequency atoms which constitute a frame of the graph signal space.

preprint2022arXiv

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks. However, the majority of existing collaborative filtering systems are not well designed to handle missing data. Particularly, in order to inject the negative signals in the training phase, these solutions largely rely on negative sampling from unobserved user-item interactions and simply treating them as negative instances, which brings the recommendation performance degradation. To address the issues, we develop a Collaborative Reflection-Augmented Autoencoder Network (CRANet), that is capable of exploring transferable knowledge from observed and unobserved user-item interactions. The network architecture of CRANet is formed of an integrative structure with a reflective receptor network and an information fusion autoencoder module, which endows our recommendation framework with the ability of encoding implicit user's pairwise preference on both interacted and non-interacted items. Additionally, a parametric regularization-based tied-weight scheme is designed to perform robust joint training of the two-stage CRANet model. We finally experimentally validate CRANet on four diverse benchmark datasets corresponding to two recommendation tasks, to show that debiasing the negative signals of user-item interactions improves the performance as compared to various state-of-the-art recommendation techniques. Our source code is available at https://github.com/akaxlh/CRANet.

preprint2022arXiv

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at: https://github.com/weiwei1206/CML.git.

preprint2022arXiv

Cross-Silo Federated Learning: Challenges and Opportunities

Federated learning (FL) is an emerging technology that enables the training of machine learning models from multiple clients while keeping the data distributed and private. Based on the participating clients and the model training scale, federated learning can be classified into two types: cross-device FL where clients are typically mobile devices and the client number can reach up to a scale of millions; cross-silo FL where clients are organizations or companies and the client number is usually small (e.g., within a hundred). While existing studies mainly focus on cross-device FL, this paper aims to provide an overview of the cross-silo FL. More specifically, we first discuss applications of cross-silo FL and outline its major challenges. We then provide a systematic overview of the existing approaches to the challenges in cross-silo FL by focusing on their connections and differences to cross-device FL. Finally, we discuss future directions and open issues that merit research efforts from the community.

preprint2022arXiv

Decision Making for Connected Automated Vehicles at Urban Intersections Considering Social and Individual Benefits

To address the coordination issue of connected automated vehicles (CAVs) at urban scenarios, a game-theoretic decision-making framework is proposed that can advance social benefits, including the traffic system efficiency and safety, as well as the benefits of individual users. Under the proposed decision-making framework, in this work, a representative urban driving scenario, i.e. the unsignalized intersection, is investigated. Once the vehicle enters the focused zone, it will interact with other CAVs and make collaborative decisions. To evaluate the safety risk of surrounding vehicles and reduce the complexity of the decision-making algorithm, the driving risk assessment algorithm is designed with a Gaussian potential field approach. The decision-making cost function is constructed by considering the driving safety and passing efficiency of CAVs. Additionally, decision-making constraints are designed and include safety, comfort, efficiency, control and stability. Based on the cost function and constraints, the fuzzy coalitional game approach is applied to the decision-making issue of CAVs at unsignalized intersections. Two types of fuzzy coalitions are constructed that reflect both individual and social benefits. The benefit allocation in the two types of fuzzy coalitions is associated with the driving aggressiveness of CAVs. Finally, the effectiveness and feasibility of the proposed decision-making framework are verified with three test cases.

preprint2022arXiv

Driving Conflict Resolution of Autonomous Vehicles at Unsignalized Intersections: A Differential Game Approach

Considering personalized driving preferences, a new decision-making framework is developed using a differential game approach to resolve the driving conflicts of autonomous vehicles (AVs) at unsignalized intersections. To realize human-like driving and personalized decision-making, driving aggressiveness is first defined for AVs. To improve driving safety, a Gaussian potential field model is built for collision risk assessment. Besides, in the proposed decision making framework, the collision risk assessment model is further used to reduce the computational complexity based on an event-triggered mechanism. In the construction of payoff function, both driving safety and passing efficiency are comprehensively considered, and the driving aggressiveness is also reflected. Two kinds of equilibrium solution to the differential game, i.e., the Nash equilibrium and Stackelberg equilibrium, are discussed and solved. Finally, the proposed decision making algorithm is tested through a hardware-in-the-loop testing platform, and its feasibility, effectiveness and real-time implementation performance are validated.

preprint2022arXiv

Efficient Global Robustness Certification of Neural Networks via Interleaving Twin-Network Encoding

The robustness of deep neural networks has received significant interest recently, especially when being deployed in safety-critical systems, as it is important to analyze how sensitive the model output is under input perturbations. While most previous works focused on the local robustness property around an input sample, the studies of the global robustness property, which bounds the maximum output change under perturbations over the entire input space, are still lacking. In this work, we formulate the global robustness certification for neural networks with ReLU activation functions as a mixed-integer linear programming (MILP) problem, and present an efficient approach to address it. Our approach includes a novel interleaving twin-network encoding scheme, where two copies of the neural network are encoded side-by-side with extra interleaving dependencies added between them, and an over-approximation algorithm leveraging relaxation and refinement techniques to reduce complexity. Experiments demonstrate the timing efficiency of our work when compared with previous global robustness certification methods and the tightness of our over-approximation. A case study of closed-loop control safety verification is conducted, and demonstrates the importance and practicality of our approach for certifying the global robustness of neural networks in safety-critical systems.

preprint2022arXiv

Hypergraph Contrastive Collaborative Filtering

Collaborative Filtering (CF) has emerged as fundamental paradigms for parameterizing users and items into latent representation space, with their correlative patterns from interaction data. Among various CF techniques, the development of GNN-based recommender systems, e.g., PinSage and LightGCN, has offered the state-of-the-art performance. However, two key challenges have not been well explored in existing solutions: i) The over-smoothing effect with deeper graph-based CF architecture, may cause the indistinguishable user representations and degradation of recommendation results. ii) The supervision signals (i.e., user-item interactions) are usually scarce and skewed distributed in reality, which limits the representation power of CF paradigms. To tackle these challenges, we propose a new self-supervised recommendation framework Hypergraph Contrastive Collaborative Filtering (HCCF) to jointly capture local and global collaborative relations with a hypergraph-enhanced cross-view contrastive learning architecture. In particular, the designed hypergraph structure learning enhances the discrimination ability of GNN-based CF paradigm, so as to comprehensively capture the complex high-order dependencies among users. Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph-enhanced self-discrimination. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods, and the robustness against sparse user interaction data. Our model implementation codes are available at https://github.com/akaxlh/HCCF.

preprint2022arXiv

Knowledge Graph Contrastive Learning for Recommendation

Knowledge Graphs (KGs) have been utilized as useful side information to improve recommendation quality. In those recommender systems, knowledge graph information often contains fruitful facts and inherent semantic relatedness among items. However, the success of such methods relies on the high quality knowledge graphs, and may not learn quality representations with two challenges: i) The long-tail distribution of entities results in sparse supervision signals for KG-enhanced item representation; ii) Real-world knowledge graphs are often noisy and contain topic-irrelevant connections between items and entities. Such KG sparsity and noise make the item-entity dependent relations deviate from reflecting their true characteristics, which significantly amplifies the noise effect and hinders the accurate representation of user's preference. To fill this research gap, we design a general Knowledge Graph Contrastive Learning framework (KGCL) that alleviates the information noise for knowledge graph-enhanced recommender systems. Specifically, we propose a knowledge graph augmentation schema to suppress KG noise in information aggregation, and derive more robust knowledge-aware representations for items. In addition, we exploit additional supervision signals from the KG augmentation process to guide a cross-view contrastive learning paradigm, giving a greater role to unbiased user-item interactions in gradient descent and further suppressing the noise. Extensive experiments on three public datasets demonstrate the consistent superiority of our KGCL over state-of-the-art techniques. KGCL also achieves strong performance in recommendation scenarios with sparse user-item interactions, long-tail and noisy KG entities. Our implementation codes are available at https://github.com/yuh-yang/KGCL-SIGIR22

preprint2022arXiv

MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Deep learning techniques for point clouds have achieved strong performance on a range of 3D vision tasks. However, it is costly to annotate large-scale point sets, making it critical to learn generalizable representations that can transfer well across different point sets. In this paper, we study a new problem of 3D Domain Generalization (3DDG) with the goal to generalize the model to other unseen domains of point clouds without any access to them in the training process. It is a challenging problem due to the substantial geometry shift from simulated to real data, such that most existing 3D models underperform due to overfitting the complete geometries in the source domain. We propose to tackle this problem via MetaSets, which meta-learns point cloud representations from a group of classification tasks on carefully-designed transformed point sets containing specific geometry priors. The learned representations are more generalizable to various unseen domains of different geometries. We design two benchmarks for Sim-to-Real transfer of 3D point clouds. Experimental results show that MetaSets outperforms existing 3D deep learning methods by large margins.

preprint2022arXiv

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Many previous studies aim to augment collaborative filtering with deep neural network techniques, so as to achieve better recommendation performance. However, most existing deep learning-based recommender systems are designed for modeling singular type of user-item interaction behavior, which can hardly distill the heterogeneous relations between user and item. In practical recommendation scenarios, there exist multityped user behaviors, such as browse and purchase. Due to the overlook of user's multi-behavioral patterns over different items, existing recommendation methods are insufficient to capture heterogeneous collaborative signals from user multi-behavior data. Inspired by the strength of graph neural networks for structured data modeling, this work proposes a Graph Neural Multi-Behavior Enhanced Recommendation (GNMR) framework which explicitly models the dependencies between different types of user-item interactions under a graph-based message passing architecture. GNMR devises a relation aggregation network to model interaction heterogeneity, and recursively performs embedding propagation between neighboring nodes over the user-item interaction graph. Experiments on real-world recommendation datasets show that our GNMR consistently outperforms state-of-the-art methods. The source code is available at https://github.com/akaxlh/GNMR.

preprint2022arXiv

Multi-Behavior Sequential Recommendation with Temporal Graph Transformer

Modeling time-evolving preferences of users with their sequential item interactions, has attracted increasing attention in many online applications. Hence, sequential recommender systems have been developed to learn the dynamic user interests from the historical interactions for suggesting items. However, the interaction pattern encoding functions in most existing sequential recommender systems have focused on single type of user-item interactions. In many real-life online platforms, user-item interactive behaviors are often multi-typed (e.g., click, add-to-favorite, purchase) with complex cross-type behavior inter-dependencies. Learning from informative representations of users and items based on their multi-typed interaction data, is of great importance to accurately characterize the time-evolving user preference. In this work, we tackle the dynamic user-item relation learning with the awareness of multi-behavior interactive patterns. Towards this end, we propose a new Temporal Graph Transformer (TGT) recommendation framework to jointly capture dynamic short-term and long-range user-item interactive patterns, by exploring the evolving correlations across different types of behaviors. The new TGT method endows the sequential recommendation architecture to distill dedicated knowledge for type-specific behavior relational context and the implicit behavior dependencies. Experiments on the real-world datasets indicate that our method TGT consistently outperforms various state-of-the-art recommendation methods. Our model implementation codes are available at https://github.com/akaxlh/TGT.

preprint2022arXiv

Multiplex Heterogeneous Graph Convolutional Network

Heterogeneous graph convolutional networks have gained great popularity in tackling various network analytical tasks on heterogeneous network data, ranging from link prediction to node classification. However, most existing works ignore the relation heterogeneity with multiplex network between multi-typed nodes and different importance of relations in meta-paths for node embedding, which can hardly capture the heterogeneous structure signals across different relations. To tackle this challenge, this work proposes a Multiplex Heterogeneous Graph Convolutional Network (MHGCN) for heterogeneous network embedding. Our MHGCN can automatically learn the useful heterogeneous meta-path interactions of different lengths in multiplex heterogeneous networks through multi-layer convolution aggregation. Additionally, we effectively integrate both multi-relation structural signals and attribute semantics into the learned node embeddings with both unsupervised and semi-supervised learning paradigms. Extensive experiments on five real-world datasets with various network analytical tasks demonstrate the significant superiority of MHGCN against state-of-the-art embedding baselines in terms of all evaluation metrics.

preprint2022arXiv

Mutual Distillation Learning Network for Trajectory-User Linking

Trajectory-User Linking (TUL), which links trajectories to users who generate them, has been a challenging problem due to the sparsity in check-in mobility data. Existing methods ignore the utilization of historical data or rich contextual features in check-in data, resulting in poor performance for TUL task. In this paper, we propose a novel Mutual distillation learning network to solve the TUL problem for sparse check-in mobility data, named MainTUL. Specifically, MainTUL is composed of a Recurrent Neural Network (RNN) trajectory encoder that models sequential patterns of input trajectory and a temporal-aware Transformer trajectory encoder that captures long-term time dependencies for the corresponding augmented historical trajectories. Then, the knowledge learned on historical trajectories is transferred between the two trajectory encoders to guide the learning of both encoders to achieve mutual distillation of information. Experimental results on two real-world check-in mobility datasets demonstrate the superiority of MainTUL against state-of-the-art baselines. The source code of our model is available at https://github.com/Onedean/MainTUL.

preprint2022arXiv

Physics-Aware Safety-Assured Design of Hierarchical Neural Network based Planner

Neural networks have shown great promises in planning, control, and general decision making for learning-enabled cyber-physical systems (LE-CPSs), especially in improving performance under complex scenarios. However, it is very challenging to formally analyze the behavior of neural network based planners for ensuring system safety, which significantly impedes their applications in safety-critical domains such as autonomous driving. In this work, we propose a hierarchical neural network based planner that analyzes the underlying physical scenarios of the system and learns a system-level behavior planning scheme with multiple scenario-specific motion-planning strategies. We then develop an efficient verification method that incorporates overapproximation of the system state reachable set and novel partition and union techniques for formally ensuring system safety under our physics-aware planner. With theoretical analysis, we show that considering the different physical scenarios and building a hierarchical planner based on such analysis may improve system safety and verifiability. We also empirically demonstrate the effectiveness of our approach and its advantage over other baselines in practical case studies of unprotected left turn and highway merging, two common challenging safety-critical tasks in autonomous driving.

preprint2022arXiv

RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation

Recipe recommendation systems play an essential role in helping people decide what to eat. Existing recipe recommendation systems typically focused on content-based or collaborative filtering approaches, ignoring the higher-order collaborative signal such as relational structure information among users, recipes and food items. In this paper, we formalize the problem of recipe recommendation with graphs to incorporate the collaborative signal into recipe recommendation through graph modeling. In particular, we first present URI-Graph, a new and large-scale user-recipe-ingredient graph. We then propose RecipeRec, a novel heterogeneous graph learning model for recipe recommendation. The proposed model can capture recipe content and collaborative signal through a heterogeneous graph neural network with hierarchical attention and an ingredient set transformer. We also introduce a graph contrastive augmentation strategy to extract informative graph knowledge in a self-supervised manner. Finally, we design a joint objective function of recommendation and contrastive learning to optimize the model. Extensive experiments demonstrate that RecipeRec outperforms state-of-the-art methods for recipe recommendation. Dataset and codes are available at https://github.com/meettyj/RecipeRec.

preprint2022arXiv

Scalable Motif Counting for Large-scale Temporal Graphs

One fundamental problem in temporal graph analysis is to count the occurrences of small connected subgraph patterns (i.e., motifs), which benefits a broad range of real-world applications, such as anomaly detection, structure prediction, and network representation learning. However, existing works focused on exacting temporal motif are not scalable to large-scale temporal graph data, due to their heavy computational costs or inherent inadequacy of parallelism. In this work, we propose a scalable parallel framework for exactly counting temporal motifs in large-scale temporal graphs. We first categorize the temporal motifs based on their distinct properties, and then design customized algorithms that offer efficient strategies to exactly count the motif instances of each category. Moreover, our compact data structures, namely triple and quadruple counters, enable our algorithms to directly identify the temporal motif instances of each category, according to edge information and the relationship between edges, therefore significantly improving the counting efficiency. Based on the proposed counting algorithms, we design a hierarchical parallel framework that features both inter- and intra-node parallel strategies, and fully leverages the multi-threading capacity of modern CPU to concurrently count all temporal motifs. Extensive experiments on sixteen real-world temporal graph datasets demonstrate the superiority and capability of our proposed framework for temporal motif counting, achieving up to 538* speedup compared to the state-of-the-art methods. The source code of our method is available at: https://github.com/steven-ccq/FAST-temporal-motif.

preprint2022arXiv

Self-Supervised Hypergraph Transformer for Recommender Systems

Graph Neural Networks (GNNs) have been shown as promising solutions for collaborative filtering (CF) with the modeling of user-item interaction graphs. The key idea of existing GNN-based recommender systems is to recursively perform the message passing along the user-item interaction edge for refining the encoded embeddings. Despite their effectiveness, however, most of the current recommendation models rely on sufficient and high-quality training data, such that the learned representations can well capture accurate user preference. User behavior data in many practical recommendation scenarios is often noisy and exhibits skewed distribution, which may result in suboptimal representation performance in GNN-based models. In this paper, we propose SHT, a novel Self-Supervised Hypergraph Transformer framework (SHT) which augments user representations by exploring the global collaborative relationships in an explicit way. Specifically, we first empower the graph neural CF paradigm to maintain global collaborative effects among users and items with a hypergraph transformer network. With the distilled global context, a cross-view generative self-supervised learning component is proposed for data augmentation over the user-item interaction graph, so as to enhance the robustness of recommender systems. Extensive experiments demonstrate that SHT can significantly improve the performance over various state-of-the-art baselines. Further ablation studies show the superior representation ability of our SHT recommendation framework in alleviating the data sparsity and noise issues. The source code and evaluation datasets are available at: https://github.com/akaxlh/SHT.

preprint2022arXiv

Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction

Crime has become a major concern in many cities, which calls for the rising demand for timely predicting citywide crime occurrence. Accurate crime prediction results are vital for the beforehand decision-making of government to alleviate the increasing concern about the public safety. While many efforts have been devoted to proposing various spatial-temporal forecasting techniques to explore dependence across locations and time periods, most of them follow a supervised learning manner, which limits their spatial-temporal representation ability on sparse crime data. Inspired by the recent success in self-supervised learning, this work proposes a Spatial-Temporal Hypergraph Self-Supervised Learning framework (ST-HSL) to tackle the label scarcity issue in crime prediction. Specifically, we propose the cross-region hypergraph structure learning to encode region-wise crime dependency under the entire urban space. Furthermore, we design the dual-stage self-supervised learning paradigm, to not only jointly capture local- and global-level spatial-temporal crime patterns, but also supplement the sparse crime representation by augmenting region self-discrimination. We perform extensive experiments on two real-life crime datasets. Evaluation results show that our ST-HSL significantly outperforms state-of-the-art baselines. Further analysis provides insights into the superiority of our ST-HSL method in the representation of spatial-temporal crime patterns. The implementation code is available at https://github.com/LZH-YS1998/STHSL.

preprint2022arXiv

Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning

Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e.g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes. To tackle these challenges, we propose Spatial-Temporal Sequential Hypergraph Network (ST-SHN) to collectively encode complex crime spatial-temporal patterns as well as the underlying category-wise crime semantic relationships. In specific, to handle spatial-temporal dynamics under the long-range and global context, we design a graph-structured message passing architecture with the integration of the hypergraph learning paradigm. To capture category-wise crime heterogeneous relations in a dynamic environment, we introduce a multi-channel routing mechanism to learn the time-evolving structural dependency across crime types. We conduct extensive experiments on two real-world datasets, showing that our proposed ST-SHN framework can significantly improve the prediction performance as compared to various state-of-the-art baselines. The source code is available at: https://github.com/akaxlh/ST-SHN.

preprint2022arXiv

Two-sided matching with firms' complementary preferences

This paper studies two-sided many-to-one matching in which firms have complementary preferences. We show that stable matchings exist under a balancedness condition that rules out a specific type of odd-length cycles formed by firms' acceptable sets. We also provide a class of preference profiles that satisfy this condition. Our results indicate that stable matching is compatible with a wide range of firms' complementary preferences.

preprint2021arXiv

Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

Neural networks are being increasingly applied to control and decision-making for learning-enabled cyber-physical systems (LE-CPSs). They have shown promising performance without requiring the development of complex physical models; however, their adoption is significantly hindered by the concerns on their safety, robustness, and efficiency. In this work, we propose COCKTAIL, a novel design framework that automatically learns a neural network-based controller from multiple existing control methods (experts) that could be either model-based or neural network-based. In particular, COCKTAIL first performs reinforcement learning to learn an optimal system-level adaptive mixing strategy that incorporates the underlying experts with dynamically-assigned weights and then conducts a teacher-student distillation with probabilistic adversarial training and regularization to synthesize a student neural network controller with improved control robustness (measured by a safe control rate metric with respect to adversarial attacks or measurement noises), control energy efficiency, and verifiability (measured by the computation time for verification). Experiments on three non-linear systems demonstrate significant advantages of our approach on these properties over various baseline methods.

preprint2021arXiv

Graph Meta Network for Multi-Behavior Recommendation

Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions. In practical recommendation scenarios, users often exhibit various intents which drive them to interact with items with multiple behavior types (e.g., click, tag-as-favorite, purchase). However, the diversity of user behaviors is ignored in most of the existing approaches, which makes them difficult to capture heterogeneous relational structures across different types of interactive behaviors. Exploring multi-typed behavior patterns is of great importance to recommendation systems, yet is very challenging because of two aspects: i) The complex dependencies across different types of user-item interactions; ii) Diversity of such multi-behavior patterns may vary by users due to their personalized preference. To tackle the above challenges, we propose a Multi-Behavior recommendation framework with Graph Meta Network to incorporate the multi-behavior pattern modeling into a meta-learning paradigm. Our developed MB-GMN empowers the user-item interaction learning with the capability of uncovering type-dependent behavior representations, which automatically distills the behavior heterogeneity and interaction diversity for recommendations. Extensive experiments on three real-world datasets show the effectiveness of MB-GMN by significantly boosting the recommendation performance as compared to various state-of-the-art baselines. The source code is available athttps://github.com/akaxlh/MB-GMN.

preprint2021arXiv

Human-Machine Adaptive Shared Control for Safe Automated Driving under Automation Degradation

In this paper, a human-machine adaptive shared control method is proposed for automated vehicles (AVs) under automation performance degradation. First, a novel risk assessment module is proposed to monitor driving behavior and evaluate automation performance degradation for AVs. Then, an adaptive control authority allocation module is developed. In the event of any performance degradation detection, the allocated control authority of the automation system is decreased based on the assessed risk to reduce the potential risk of vehicle motion. Consequently, the control authority allocated to the human driver is adaptively increased and thus requires more driver engagement in the control loop to compensate for the automation degradation and ensure AV safety. Experimental validation is conducted under different driving scenarios. The testing results show that the proposed approach is able to effectively compensate for the performance degradation of vehicle automation through the human-machine adaptive shared control, ensuring the safety of automated driving

preprint2021arXiv

On Joint Reconstruction of State and Input-Output Injection Attacks for Nonlinear Systems

We address the problem of robust state reconstruction for discrete-time nonlinear systems when the actuators and sensors are injected with (potentially unbounded) attack signals. Exploiting redundancy in sensors and actuators and using a bank of unknown input observers (UIOs), we propose an observer-based estimator capable of providing asymptotic estimates of the system state and attack signals under the condition that the numbers of sensors and actuators under attack are sufficiently small. Using the proposed estimator, we provide methods for isolating the compromised actuators and sensors. Numerical examples are provided to demonstrate the effectiveness of our methods.

preprint2021arXiv

Recent Advances in Heterogeneous Relation Learning for Recommendation

Recommender systems have played a critical role in many web applications to meet user's personalized interests and alleviate the information overload. In this survey, we review the development of recommendation frameworks with the focus on heterogeneous relational learning, which consists of different types of dependencies among users and items. The objective of this task is to map heterogeneous relational data into latent representation space, such that the structural and relational properties from both user and item domain can be well preserved. To address this problem, recent research developments can fall into three major lines: social recommendation, knowledge graph-enhanced recommender system, and multi-behavior recommendation. We discuss the learning approaches in each category, such as matrix factorization, attention mechanism and graph neural networks, for effectively distilling heterogeneous contextual information. Finally, we present an exploratory outlook to highlight several promising directions and opportunities in heterogeneous relational learning frameworks for recommendation.

preprint2020arXiv

A Game-Theoretic Approach to Decision Making for Multiple Vehicles at Roundabout

In this paper, we study the decision making of multiple autonomous vehicles at a roundabout. The behaviours of the vehicles depend on their aggressiveness, which indicates how much they value speed over safety. We propose a distributed decision-making process that balances safety and speed of the vehicles. In the proposed process, each vehicle estimates other vehicles' aggressiveness and formulates the interactions among the vehicles as a finite sequential game. Based on the Nash equilibrium of this game, the vehicle predicts other vehicles' behaviours and makes decisions. We perform numerical simulations to illustrate the effectiveness of the proposed process, both for safety (absence of collisions), and speed (time spent within the roundabout).

preprint2020arXiv

An Integrated Framework of Decision Making and Motion Planning for Autonomous Vehicles Considering Social Behaviors

This paper presents a novel integrated approach to deal with the decision making and motion planning for lane-change maneuvers of autonomous vehicle (AV) considering social behaviors of surrounding traffic occupants. Reflected by driving styles and intentions of surrounding vehicles, the social behaviors are taken into consideration during the modelling process. Then, the Stackelberg Game theory is applied to solve the decision-making, which is formulated as a non-cooperative game problem. Besides, potential field is adopted in the motion planning model, which uses different potential functions to describe surrounding vehicles with different behaviors and road constrains. Then, Model Predictive Control (MPC) is utilized to predict the state and trajectory of the autonomous vehicle. Finally, the decision-making and motion planning is then integrated into a constrained multi-objective optimization problem. Three testing scenarios considering different social behaviors of surrounding vehicles are carried out to validate the performance of the proposed approach. Testing results show that the integrated approach is able to address different social interactions with other traffic participants, and make proper and safe decisions and planning for autonomous vehicles, demonstrating its feasibility and effectiveness.

preprint2020arXiv

Analysis of Scoliosis From Spinal X-Ray Images

Scoliosis is a congenital disease in which the spine is deformed from its normal shape. Measurement of scoliosis requires labeling and identification of vertebrae in the spine. Spine radiographs are the most cost-effective and accessible modality for imaging the spine. Reliable and accurate vertebrae segmentation in spine radiographs is crucial in image-guided spinal assessment, disease diagnosis, and treatment planning. Conventional assessments rely on tedious and time-consuming manual measurement, which is subject to inter-observer variability. A fully automatic method that can accurately identify and segment the associated vertebrae is unavailable in the literature. Leveraging a carefully-adjusted U-Net model with progressive side outputs, we propose an end-to-end segmentation model that provides a fully automatic and reliable segmentation of the vertebrae associated with scoliosis measurement. Our experimental results from a set of anterior-posterior spine X-Ray images indicate that our model, which achieves an average Dice score of 0.993, promises to be an effective tool in the identification and labeling of spinal vertebrae, eventually helping doctors in the reliable estimation of scoliosis. Moreover, estimation of Cobb angles from the segmented vertebrae further demonstrates the effectiveness of our model.

preprint2020arXiv

Balanced paring of $\{1,2,\ldots,(p-1)/2\}$ for $p\equiv 1 \pmod{4}$

Let $p\equiv 1 \pmod{4}$ be a prime. Write $t = \prod_{x=1}^{(p-1)/2}x$. Since $t ^2\equiv -1 \pmod{p}$ , we can divide $\{1,2,\ldots,(p-1)/2\}$ into $(p-1)/4$ ordered pairs so that each pair, say $<a,\tilde{a}>$ , satisfies that $t a \equiv \pm \tilde{a} \pmod{p}.$ For any two such pairs, assume $a<\tilde{a}, b<\tilde{b}, a<b $, then there are three possibilities for their relative order : $a<\tilde{a} < b< \tilde{b}$ , $a< b < \tilde{a} < \tilde{b}$ , $a< b < \tilde{b}< \tilde{a}$. We show this paring is balanced in the sense that the three cases occur with equal frequencies. Utilizing properties of this paring we solve one problem raised by Zhi-Wei Sun concerning the sign of permutation related to quadratic residues.

preprint2020arXiv

Bipartite Distance for Shape-Aware Landmark Detection in Spinal X-Ray Images

Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spinal landmarks. To guide a CNN in the learning of spinal shape while detecting landmarks in X-ray images, we propose a novel loss based on a bipartite distance (BPD) measure, and show that it consistently improves landmark detection performance.

preprint2020arXiv

Cross-Layer Design of Automotive Systems

With growing system complexity and closer cyber-physical interaction, there are increasingly stronger dependencies between different function and architecture layers in automotive systems. This paper first introduces several cross-layer approaches we developed in the past for holistically addressing multiple system layers in the design of individual vehicles and of connected vehicle applications; and then presents a new methodology based on the weakly-hard paradigm for leveraging the scheduling flexibility in architecture layer to improve the system performance at function layer. The results of these works demonstrate the importance and effectiveness of cross-layer design for automotive systems.

preprint2020arXiv

Extreme Image Coding via Multiscale Autoencoders With Generative Adversarial Optimization

We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network(GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimization. We compare the perceptual quality of our reconstructions with traditional compression algorithms using High-Efficiency Video Coding(HEVC) based Intra Profile and JPEG2000 on the public Cityscapes and ADE20K datasets, demonstrating the significant subjective quality improvement.

preprint2020arXiv

Human-Machine Collaboration for Automated Vehicles via an Intelligent Two-Phase Haptic Interface

Prior to realizing fully autonomous driving, human intervention will be required periodically to guarantee vehicle safety. This fact poses a new challenge in human-machine interaction, particularly during control authority transition from the automated functionality to a human driver. This paper addresses this challenge by proposing an intelligent haptic interface based on a newly developed two-phase human-machine interaction model. The intelligent haptic torque is applied on the steering wheel and switches its functionality between predictive guidance and haptic assistance according to the varying state and control ability of human drivers, helping drivers gradually resume manual control during takeover. The developed approach is validated by conducting vehicle experiments with 26 human participants. The results suggest that the proposed method can effectively enhance the driving state recovery and control performance of human drivers during takeover compared with an existing approach, further improving the safety and smoothness of the human-machine interaction in automated vehicles.

preprint2020arXiv

Inner Attention Supported Adaptive Cooperation for Heterogeneous Multi Robots Teaming based on Multi-agent Reinforcement Learning

Humans can selectively focus on different information based on different tasks requirements, other people's abilities and availability. Therefore, they can adapt quickly to a completely different and complex environments. If, like people, robot could obtain the same abilities, then it would greatly increase their adaptability to new and unexpected situations. Recent efforts in Heterogeneous Multi Robots Teaming have try to achieve this ability, such as the methods based on communication and multi-modal information fusion strategies. However, these methods will not only suffer from the exponential explosion problem with the increase of robots number but also need huge computational resources. To that end, we introduce an inner attention actor-critic method that replicates aspects of human flexibly cooperation. By bringing attention mechanism on computer vision, natural language process into the realm of multi-robot cooperation, our attention method is able to dynamically select which robots to attend to. In order to test the effectiveness of our proposed method, several simulation experiments have been designed. And the results show that inner attention mechanism can enable flexible cooperation and lower resources consuming in rescuing tasks.

preprint2020arXiv

Legendre Symbol of $\prod f(i,j) $ over $ 0<i<j<p/2, \ p\nmid f(i,j) $

Let $p>3$ be a prime. We investigate Legendre symbol of $\displaystyle \prod_{0<i<j<p/2 \atop p\nmid f(i,j) } f(i,j) \ $, where $i,j\in \Bbb Z, f(i,j)$ is a linear or quadratic form with integer coefficients. When $f=ai^2+bij+cj^2$ and $p\nmid c(a+b+c)$ , we prove that to evaluate the product is equivalent to determine $ \displaystyle \sum_{y=1}^{p-1} \bigg(\frac{y(y+1)(y+k)}{p}\bigg) \pmod{16}$ , where $4c(a+b+c)k \equiv (4ac-b^2)\pmod{p}.$ Parallel results are given for $\displaystyle \prod_{i,j=1 \atop p\nmid f(i,j) }^{(p-1)/2} \bigg(\frac{ f(i,j) }{p}\bigg).$ Then we show that $ \displaystyle \sum_{y=1}^{p-1} \bigg(\frac{y(y+1)(y+k)}{p}\bigg) \pmod{16}$ can be evaluated explicitly when k=2,4,5,9,10 or k is a square. And for several classes of f(i,j) these two kinds of products can be evaluated explicitly. Finally when f is a linear form we give unified identities for these products. Thus we prove these kind of problems raised in Sun's paper.

preprint2020arXiv

Non-Local Part-Aware Point Cloud Denoising

This paper presents a novel non-local part-aware deep neural network to denoise point clouds by exploring the inherent non-local self-similarity in 3D objects and scenes. Different from existing works that explore small local patches, we design the non-local learning unit (NLU) customized with a graph attention module to adaptively capture non-local semantically-related features over the entire point cloud. To enhance the denoising performance, we cascade a series of NLUs to progressively distill the noise features from the noisy inputs. Further, besides the conventional surface reconstruction loss, we formulate a semantic part loss to regularize the predictions towards the relevant parts and enable denoising in a part-aware manner. Lastly, we performed extensive experiments to evaluate our method, both quantitatively and qualitatively, and demonstrate its superiority over the state-of-the-arts on both synthetic and real-scanned noisy inputs.

preprint2020arXiv

Opportunistic Intermittent Control with Safety Guarantees for Autonomous Systems

Control schemes for autonomous systems are often designed in a way that anticipates the worst case in any situation. At runtime, however, there could exist opportunities to leverage the characteristics of specific environment and operation context for more efficient control. In this work, we develop an online intermittent-control framework that combines formal verification with model-based optimization and deep reinforcement learning to opportunistically skip certain control computation and actuation to save actuation energy and computational resources without compromising system safety. Experiments on an adaptive cruise control system demonstrate that our approach can achieve significant energy and computation savings.

preprint2020arXiv

Partly Supervised Multitask Learning

Semi-supervised learning has recently been attracting attention as an alternative to fully supervised models that require large pools of labeled data. Moreover, optimizing a model for multiple tasks can provide better generalizability than single-task learning. Leveraging self-supervision and adversarial training, we propose a novel general purpose semi-supervised, multiple-task model---namely, self-supervised, semi-supervised, multitask learning (S$^4$MTL)---for accomplishing two important tasks in medical imaging, segmentation and diagnostic classification. Experimental results on chest and spine X-ray datasets suggest that our S$^4$MTL model significantly outperforms semi-supervised single task, semi/fully-supervised multitask, and fully-supervised single task models, even with a 50\% reduction of class and segmentation labels. We hypothesize that our proposed model can be effective in tackling limited annotation problems for joint training, not only in medical imaging domains, but also for general-purpose vision tasks.

preprint2020arXiv

Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series

The prevalence of wearable sensors (e.g., smart wristband) is creating unprecedented opportunities to not only inform health and wellness states of individuals, but also assess and infer personal attributes, including demographic and personality attributes. However, the data captured from wearables, such as heart rate or number of steps, present two key challenges: 1) the time series is often of variable-length and incomplete due to different data collection periods (e.g., wearing behavior varies by person); and 2) inter-individual variability to external factors like stress and environment. This paper addresses these challenges and brings us closer to the potential of personalized insights about an individual, taking the leap from quantified self to qualified self. Specifically, HeartSpace proposed in this paper encodes time series data with variable-length and missing values via the integration of a time series encoding module and a pattern aggregation network. Additionally, HeartSpace implements a Siamese-triplet network to optimize representations by jointly capturing intra- and inter-series correlations during the embedding learning process. The empirical evaluation over two different real-world data presents significant performance gains overstate-of-the-art baselines in a variety of applications, including personality prediction, demographics inference, and user identification.

preprint2020arXiv

SAW: A Tool for Safety Analysis of Weakly-hard Systems

We introduce SAW, a tool for safety analysis of weakly-hard systems, in which traditional hard timing constraints are relaxed to allow bounded deadline misses for improving design flexibility and runtime resiliency. Safety verification is a key issue for weakly-hard systems, as it ensures system safety under allowed deadline misses. Previous works are either for linear systems only, or limited to a certain type of nonlinear systems (e.g., systems that satisfy exponential stability and Lipschitz continuity of the system dynamics). In this work, we propose a new technique for infinite-time safety verification of general nonlinear weakly-hard systems. Our approach first discretizes the safe state set into grids and constructs a directed graph, where nodes represent the grids and edges represent the reachability relation. Based on graph theory and dynamic programming, our approach can effectively find the safe initial set (consisting of a set of grids), from which the system can be proven safe under given weakly-hard constraints. Experimental results demonstrate the effectiveness of our approach, when compared with the state-of-the-art. An open source implementation of our tool is available at https://github.com/551100kk/SAW. The virtual machine where the tool is ready to run can be found at https://www.csie.ntu.edu.tw/~r08922054/SAW.ova.

preprint2019arXiv

Graph Fourier Transform Based on $\ell_1$ Norm Variation Minimization

The definition of the graph Fourier transform is a fundamental issue in graph signal processing. Conventional graph Fourier transform is defined through the eigenvectors of the graph Laplacian matrix, which minimize the $\ell_2$ norm signal variation. However, the computation of Laplacian eigenvectors is expensive when the graph is large. In this paper, we propose an alternative definition of graph Fourier transform based on the $\ell_1$ norm variation minimization. We obtain a necessary condition satisfied by the $\ell_1$ Fourier basis, and provide a fast greedy algorithm to approximate the $\ell_1$ Fourier basis. Numerical experiments show the effectiveness of the greedy algorithm. Moreover, the Fourier transform under the greedy basis demonstrates a similar rate of decay to that of Laplacian basis for simulated or real signals.

preprint2015arXiv

Joint Reconstruction of Absorbed Optical Energy Density and Sound Speed Distribution in Photoacoustic Computed Tomography: A numerical Investigation

Photoacoustic computed tomography (PACT) is a rapidly emerging bioimaging modality that seeks to reconstruct an estimate of the absorbed optical energy density within an object. Conventional PACT image reconstruction methods assume a constant speed-of-sound (SOS), which can result in image artifacts when acoustic aberrations are significant. It has been demonstrated that incorporating knowledge of an object's SOS distribution into a PACT image reconstruction method can improve image quality. However, in many cases, the SOS distribution cannot be accurately and/or conveniently estimated prior to the PACT experiment. Because variations in the SOS distribution induce aberrations in the measured photoacoustic wavefields, certain information regarding an object's SOS distribution is encoded in the PACT measurement data. Based on this observation, a joint reconstruction (JR) problem has been proposed in which the SOS distribution is concurrently estimated along with the sought-after absorbed optical energy density from the photoacoustic measurement data. A broad understanding of the extent to which the JR problem can be accurately and reliably solved has not been reported. In this work, a series of numerical experiments is described that elucidate some important properties of the JR problem that pertain to its practical feasibility. To accomplish this, an optimization-based formulation of the JR problem is developed that yields a non-linear iterative algorithm that alternatingly updates the two image estimates. Heuristic analytic insights into the reconstruction problem are also provided. These results confirm the ill-conditioned nature of the joint reconstruction problem that will present significant challenges for practical applications.

preprint2013arXiv

Accelerating Image Reconstruction in Three-Dimensional Optoacoustic Tomography on Graphics Processing Units

Purpose: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional (2D) imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming techniques. Methods: Parallelization strategies are proposed to accelerate a filtered backprojection (FBP) algorithm and two different pairs of projection/backprojection operations that correspond to two different numerical imaging models. The algorithms are designed to fully exploit the parallel computing power of graphic processing units (GPUs). In order to evaluate the parallelization strategies for the projection/backprojection pairs, an iterative image reconstruction algorithm is implemented. Computer-simulation and experimental studies are conducted to investigate the computational efficiency and numerical accuracy of the developed algorithms. Results: The GPU implementations improve the computational efficiency by factors of 1, 000, 125, and 250 for the FBP algorithm and the two pairs of projection/backprojection operators, respectively. Accurate images are reconstructed by use of the FBP and iterative image reconstruction algorithms from both computer-simulated and experimental data. Conclusions: Parallelization strategies for 3D OAT image reconstruction are proposed for the first time. These GPU-based implementations significantly reduce the computational time for 3D image reconstruction, complementing our earlier work on 3D OAT iterative image reconstruction.

preprint2013arXiv

Full-Wave Iterative Image Reconstruction in Photoacoustic Tomography with Acoustically Inhomogeneous Media

Existing approaches to image reconstruction in photoacoustic computed tomography (PACT) with acoustically heterogeneous media are limited to weakly varying media, are computationally burdensome, and/or cannot effectively mitigate the effects of measurement data incompleteness and noise. In this work, we develop and investigate a discrete imaging model for PACT that is based on the exact photoacoustic (PA) wave equation and facilitates the circumvention of these limitations. A key contribution of the work is the establishment of a procedure to implement a matched forward and backprojection operator pair associated with the discrete imaging model, which permits application of a wide-range of modern image reconstruction algorithms that can mitigate the effects of data incompleteness and noise. The forward and backprojection operators are based on the k-space pseudospectral method for computing numerical solutions to the PA wave equation in the time domain. The developed reconstruction methodology is investigated by use of both computer-simulated and experimental PACT measurement data.

Chao Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

53 published item(s)

Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method

Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Semantic visually-guided acoustic highlighting with large vision-language models

SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere

LLMRec: Large Language Models with Graph Augmentation for Recommendation

A Tool for Neural Network Global Robustness Certification and Training

Atomic Filter: a Weak Form of Shift Operator for Graph Signals

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

Cross-Silo Federated Learning: Challenges and Opportunities

Decision Making for Connected Automated Vehicles at Urban Intersections Considering Social and Individual Benefits

Driving Conflict Resolution of Autonomous Vehicles at Unsignalized Intersections: A Differential Game Approach

Efficient Global Robustness Certification of Neural Networks via Interleaving Twin-Network Encoding

Hypergraph Contrastive Collaborative Filtering

Knowledge Graph Contrastive Learning for Recommendation

MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Multi-Behavior Sequential Recommendation with Temporal Graph Transformer

Multiplex Heterogeneous Graph Convolutional Network

Mutual Distillation Learning Network for Trajectory-User Linking

Physics-Aware Safety-Assured Design of Hierarchical Neural Network based Planner

RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation

Scalable Motif Counting for Large-scale Temporal Graphs

Self-Supervised Hypergraph Transformer for Recommender Systems

Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction

Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning

Two-sided matching with firms' complementary preferences

Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

Graph Meta Network for Multi-Behavior Recommendation

Human-Machine Adaptive Shared Control for Safe Automated Driving under Automation Degradation

On Joint Reconstruction of State and Input-Output Injection Attacks for Nonlinear Systems

Recent Advances in Heterogeneous Relation Learning for Recommendation

A Game-Theoretic Approach to Decision Making for Multiple Vehicles at Roundabout

An Integrated Framework of Decision Making and Motion Planning for Autonomous Vehicles Considering Social Behaviors

Analysis of Scoliosis From Spinal X-Ray Images

Balanced paring of $\{1,2,\ldots,(p-1)/2\}$ for $p\equiv 1 \pmod{4}$

Bipartite Distance for Shape-Aware Landmark Detection in Spinal X-Ray Images

Cross-Layer Design of Automotive Systems

Extreme Image Coding via Multiscale Autoencoders With Generative Adversarial Optimization

Human-Machine Collaboration for Automated Vehicles via an Intelligent Two-Phase Haptic Interface

Inner Attention Supported Adaptive Cooperation for Heterogeneous Multi Robots Teaming based on Multi-agent Reinforcement Learning

Legendre Symbol of $\prod f(i,j) $ over $ 0<i<j<p/2, \ p\nmid f(i,j) $

Non-Local Part-Aware Point Cloud Denoising

Opportunistic Intermittent Control with Safety Guarantees for Autonomous Systems

Partly Supervised Multitask Learning

Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series

SAW: A Tool for Safety Analysis of Weakly-hard Systems

Graph Fourier Transform Based on $\ell_1$ Norm Variation Minimization

Joint Reconstruction of Absorbed Optical Energy Density and Sound Speed Distribution in Photoacoustic Computed Tomography: A numerical Investigation

Accelerating Image Reconstruction in Three-Dimensional Optoacoustic Tomography on Graphics Processing Units

Full-Wave Iterative Image Reconstruction in Photoacoustic Tomography with Acoustically Inhomogeneous Media