Source author record

Xiang Zhang

Xiang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

140works

53topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. Although an increasing number of gene therapies have received regulatory approvals in recent years, understanding their long-term efficacy in trials with limited follow-up time remains challenging. To address this critical question, we propose a novel Bayesian framework designed to selectively integrate relevant external data with internal trial data to improve the inference of the durability of long-term efficacy. We proved that the proposed method has desired theoretical properties, such as identifying and favoring external subsets deemed relevant, where the relevance is defined as the similarity, induced by the marginal likelihood, between the generating mechanisms of the internal data and the selected external data. We also conducted comprehensive simulations to evaluate its performance under various scenarios. Furthermore, we apply this method to predict and infer the endogenous factor IX (FIX) levels of patients who receive Etranacogene dezaparvovec over the long-term. Our estimated long-term FIX levels, validated by recent trial data, indicate that Etranacogene dezaparvovec induces sustained FIX production. Together, the theoretical findings, simulation results, and successful application of this framework underscore its potential to address similar long-term effectiveness estimation and inference questions in real world applications.

preprint2026arXiv

Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition

In this paper, we propose GesFi, a novel WiFi-based gesture recognition system that introduces WiFi latent domain mining to redefine domains directly from the data itself. GesFi first processes raw sensing data collected from WiFi receivers using CSI-ratio denoising, Short-Time Fast Fourier Transform, and visualization techniques to generate standardized input representations. It then employs class-wise adversarial learning to suppress gesture semantic and leverages unsupervised clustering to automatically uncover latent domain factors responsible for distributional shifts. These latent domains are then aligned through adversarial learning to support robust cross-domain generalization. Finally, the system is applied to the target environment for robust gesture inference. We deployed GesFi under both single-pair and multi-pair settings using commodity WiFi transceivers, and evaluated it across multiple public datasets and real-world environments. Compared to state-of-the-art baselines, GesFi achieves up to 78% and 50% performance improvements over existing adversarial methods, and consistently outperforms prior generalization approaches across most cross-domain tasks.

preprint2026arXiv

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

WiFi-based 3D human pose estimation offers a low-cost and privacy-preserving alternative to vision-based systems for smart interaction. However, existing approaches rely on visual 3D poses as supervision and directly regress CSI to a camera-based coordinate system. We find that this practice leads to coordinate overfitting: models memorize deployment-specific WiFi transceiver layouts rather than only learning activity-relevant representations, resulting in severe generalization failures. To address this challenge, we present PerceptAlign, the first geometry-conditioned framework for WiFi-based cross-layout pose estimation. PerceptAlign introduces a lightweight coordinate unification procedure that aligns WiFi and vision measurements in a shared 3D space using only two checkerboards and a few photos. Within this unified space, it encodes calibrated transceiver positions into high-dimensional embeddings and fuses them with CSI features, making the model explicitly aware of device geometry as a conditional variable. This design forces the network to disentangle human motion from deployment layouts, enabling robust and, for the first time, layout-invariant WiFi pose estimation. To support systematic evaluation, we construct the largest cross-domain 3D WiFi pose estimation dataset to date, comprising 21 subjects, 5 scenes, 18 actions, and 7 device layouts. Experiments show that PerceptAlign reduces in-domain error by 12.3% and cross-domain error by more than 60% compared to state-of-the-art baselines. These results establish geometry-conditioned learning as a viable path toward scalable and practical WiFi sensing.

preprint2026arXiv

Cross-Subject Generalization for EEG Decoding: A Survey of Deep Learning Methods

Deep learning for cross-subject EEG decoding is hindered by high inter-subject variability, which introduces a severe domain shift between training and unseen test subjects. This survey presents a comprehensive review of deep learning methodologies specifically engineered to address this cross-subject generalization challenge. To ground this analysis, we formalize the cross-subject setting as a multi-source domain problem and delineate the rigorous, subject-independent evaluation protocols required for valid assessment. Central to this survey is a systematic taxonomy of the current literature into discrete methodological families, including feature alignment, adversarial learning, feature disentanglement, and contrastive learning. We conclude by examining three critical elements for advancing robust, real-world decoding: the theoretical limitations of current methodologies, the structural value of subject identity, and the emergence of EEG foundation models.

preprint2026arXiv

DexH2R: Task-oriented Dexterous Manipulation from Human to Robots

Dexterous manipulation is a critical aspect of human capability, enabling interaction with a wide variety of objects. Recent advancements in learning from human demonstrations and teleoperation have enabled progress for robots in such ability. However, these approaches either require complex data collection such as costly human effort for eye-robot contact, or suffer from poor generalization when faced with novel scenarios. To solve both challenges, we propose a framework, DexH2R, that combines human hand motion retargeting with a task-oriented residual action policy, improving task performance by bridging the embodiment gap between human and robotic dexterous hands. Specifically, DexH2R learns the residual policy directly from retargeted primitive actions and task-oriented rewards, eliminating the need for labor-intensive teleoperation systems. Moreover, we incorporate test-time guidance for novel scenarios by taking in desired trajectories of human hands and objects, allowing the dexterous hand to acquire new skills with high generalizability. Extensive experiments in both simulation and real-world environments demonstrate the effectiveness of our work, outperforming prior state-of-the-arts by 40% across various settings.

preprint2026arXiv

DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models

Continual learning enables vision-language models to accumulate knowledge and adapt to evolving tasks without retraining from scratch. However, in multi-domain task-incremental learning, large domain shifts intensify the stability-plasticity dilemma. Most existing methods rely on fixed architectures with statically allocated parameters, which limits adaptation to new domains and aggravates catastrophic forgetting. To address these challenges, we propose DIMoE-Adapters, a Dynamic Incremental Mixture-of-Experts Adapters framework that introduces a dynamic expert evolution paradigm to balance stability and plasticity. This paradigm is implemented through two collaborative components: Self-Calibrated Expert Evolution (SCEE) and Prototype-Guided Expert Selection (PGES). SCEE constructs and evolves a sparse expert pool through expert optimization dynamics, improving plasticity while reducing redundant capacity. PGES controls expert utilization based on the pool shaped by SCEE, improving stability across both previously encountered and unseen tasks. Extensive experiments show that DIMoE-Adapters outperforms previous state-of-the-art methods across various settings.

preprint2026arXiv

FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning

Tabular data high-stakes critical decision-making in domains such as finance, healthcare, and scientific discovery. Yet, learning effectively from tabular data in few-shot settings, where labeled examples are scarce, remains a fundamental challenge. Traditional tree-based methods often falter in these regimes due to their reliance on statistical purity metrics, which become unstable and prone to overfitting with limited supervision. At the same time, direct applications of large language models (LLMs) often overlook its inherent structure, leading to suboptimal performance. To overcome these limitations, we propose FORESTLLM, a novel framework that unifies the structural inductive biases of decision forests with the semantic reasoning capabilities of LLMs. Crucially, FORESTLLM leverages the LLM only during training, treating it as an offline model designer that encodes rich, contextual knowledge into a lightweight, interpretable forest model, eliminating the need for LLM inference at test time. Our method is two-fold. First, we introduce a semantic splitting criterion in which the LLM evaluates candidate partitions based on their coherence over both labeled and unlabeled data, enabling the induction of more robust and generalizable tree structures under few-shot supervision. Second, we propose a one-time in-context inference mechanism for leaf node stabilization, where the LLM distills the decision path and its supporting examples into a concise, deterministic prediction, replacing noisy empirical estimates with semantically informed outputs. Across a diverse suite of few-shot classification and regression benchmarks, FORESTLLM achieves state-of-the-art performance.

preprint2026arXiv

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Soft boundaries, like thin hairs, are commonly observed in natural and computer-generated imagery, but they remain challenging for 3D vision due to the ambiguous mixing of foreground and background cues. This paper introduces Guardians of the Hair (HairGuard), a framework designed to recover fine-grained soft boundary details in 3D vision tasks. Specifically, we first propose a novel data curation pipeline that leverages image matting datasets for training and design a depth fixer network to automatically identify soft boundary regions. With a gated residual module, the depth fixer refines depth precisely around soft boundaries while maintaining global depth quality, allowing plug-and-play integration with state-of-the-art depth models. For view synthesis, we perform depth-based forward warping to retain high-fidelity textures, followed by a generative scene painter that fills disoccluded regions and eliminates redundant background artifacts within soft boundaries. Finally, a color fuser adaptively combines warped and inpainted results to produce novel views with consistent geometry and fine-grained details. Extensive experiments demonstrate that HairGuard achieves state-of-the-art performance across monocular depth estimation, stereo image/video conversion, and novel view synthesis, with significant improvements in soft boundary regions.

preprint2026arXiv

Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models

Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural properties in high-dimensional spaces. From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. To address this issue, we propose HDSD, a Hierarchical Dual-Subspace Decoupling framework for continual learning in vision-language models. Specifically, we introduce a lightweight Feature Modulation Module (FMM) that explicitly decomposes the parameter space into general and task-specific subspaces. Building on this design, we develop two complementary components. First, a General Fusion Module (GFM) evaluates relative parameter changes across tasks and uses an adaptive threshold to capture stable and transferable knowledge. Second, a Hierarchical Learning Module (HLM) performs structured parameter decomposition via Singular Value Decomposition (SVD) and uses a scaling mechanism to constrain updates within distinct subspace scales. Together, these designs reduce subspace interference and parameter drift. Extensive experiments on conventional benchmarks show that HDSD achieves state-of-the-art results.

preprint2026arXiv

Hierarchical Secure Aggregation with Heterogeneous Security Constraints and Arbitrary User Collusion

In hierarchical secure aggregation (HSA), a server communicates with clustered users through an intermediate layer of relays to compute the sum of users' inputs under two security requirements -- server security and relay security. Server security requires that the server learns nothing beyond the desired sum even when colluding with a subset of users, while relay security requires that each relay remains oblivious to the users' inputs under collusion. Existing work on HSA enforces homogeneous security where \tit{all} inputs must be protected against \tit{any} subset of potential colluding users with sizes up to a predefined threshold. Such a \homo formulation cannot capture scenarios with \tit{\het} \secty \reqs where \diff users may demand various levels of protection. In this paper, we study hierarchical secure aggregation (HSA) with heterogeneous security requirements and arbitrary user collusion. Specifically, we consider scenarios where the inputs of certain groups of users must remain information-theoretically secure against inference by the server or any relay, even if the server or any relay colludes with an arbitrary subset of other users. Under server security, the server learns nothing about these protected inputs beyond the prescribed aggregate sum, despite any such collusion. Under relay security, each relay similarly obtains no information about the protected inputs under the same collusion model. We characterize the optimal communication rates achievable across all layers for all parameter regimes. Furthermore, we study the minimum source keys required at the users to ensure security. For this source key requirement, we provide tight characterizations in two broad regimes determined by the security and collusion constraints, and establish a general information-theoretic lower bound together with a bounded-gap achievable scheme for the remaining regime.

preprint2026arXiv

Optimal Communication and Key Rate Region for Hierarchical Secure Aggregation with User Collusion

Secure aggregation is concerned with the task of securely uploading the inputs of multiple users to an aggregation server without letting the server know the inputs beyond their summation. It finds broad applications in distributed machine learning paradigms such as federated learning (FL) where multiple clients, each having access to a proprietary dataset, periodically upload their locally trained models (abstracted as inputs) to a parameter server which then generates an aggregate (e.g., averaged) model that is sent back to the clients as an initializing point for a new round of local training. To enhance the data privacy of the clients, secure aggregation protocols are developed using techniques from cryptography to ensure that the server infers no more information of the users' inputs beyond the desired aggregated input, even if the server can collude with some users. Although laying the ground for understanding the fundamental utility-security trade-off in secure aggregation, the simple star client-server architecture cannot capture more complex network architectures used in practical systems. Motivated by hierarchical federated learning, we investigate the secure aggregation problem in a $3$-layer hierarchical network consisting of clustered users connecting to an aggregation server through an intermediate layer of relays. Besides the conventional server security which requires that the server learns nothing beyond the desired sum of inputs, relay security is also imposed so that the relays infer nothing about the users' inputs and remain oblivious. For such a hierarchical secure aggregation (HSA) problem, we characterize the optimal multifaceted trade-off between communication (in terms of user-to-relay and relay-to-server communication rates) and secret key generation efficiency (in terms of individual key and source key rates).

preprint2026arXiv

Optimal Rate Region for Multi-server Secure Aggregation with User Collusion

Secure aggregation is a fundamental primitive in privacy-preserving distributed learning systems, where an aggregator aims to compute the sum of users' inputs without revealing individual data. In this paper, we study a multi-server secure aggregation problem in a two-hop network consisting of multiple aggregation servers and multiple users per server, under the presence of user collusion. Each user communicates only with its associated server, while the servers exchange messages to jointly recover the global sum. We adopt an information-theoretic security framework, allowing up to $T$ users to collude with any server. We characterize the complete optimal rate region in terms of user-to-server communication rate, server-to-server communication rate, individual key rate, and source key rate. Our main result shows that the minimum communication and individual key rates are all one symbol per input symbol, while the optimal source key rate is given by $\min\{U+V+T-2,\, UV-1\}$, where $U$ denotes the number of servers and $V$ the number of users per server. The achievability is established via a linear key construction that ensures correctness and security against colluding users, while the converse proof relies on tight entropy bounds derived from correctness and security constraints. The results reveal a fundamental tradeoff between security and key efficiency and demonstrate that the multi-server architecture can significantly reduce the required key randomness compared to single-server secure aggregation. Our findings provide a complete information-theoretic characterization of secure aggregation in multi-server systems with user collusion.

preprint2026arXiv

Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign

Accurate peak detection across diverse cardiac physiological signals, including the Electrocardiogram (ECG), Photoplethysmogram (PPG), Ballistocardiogram (BCG), and Bodyseismography (BSG), is fundamental for cardiovascular monitoring but is often hindered by artifacts and signal variability. Conventional algorithms are typically engineered with expert knowledge for a single signal modality, limiting their generalizability. Conversely, deep learning-based methods often lack interpretability, limiting transparency for expert verification and hindering expert-computer interaction. To address these limitations, we introduce Peak-Detector, a novel framework that leverages instruction-tuned Large Language Models (LLMs) for robust, cross-modal, and explainable peak detection. A core innovation of our framework is a "peak-representation" technique that transforms time-series data into a condensed format, preserving critical event information while significantly reducing signal length. This representation provides a crucial inductive bias, guiding the LLM to reason over physiologically meaningful events rather than raw, noisy data. The model is optimized through a two-stage process: supervised fine-tuning (SFT) followed by reinforcement learning (RL) with a multi-objective reward function. The model's self-explanation capabilities are cultivated by fine-tuning on a custom-built Peak-Explanation dataset. Across four modalities-ECG, PPG, BCG, and BSG-spanning seven datasets (six public benchmarks plus one real-world cohort), Peak-Detector demonstrates strong cross-modal performance, achieving best or tied-best detection under clinically relevant temporal tolerance. Beyond accuracy, the generated rationales surface failure modes and support verification and error analysis.

preprint2026arXiv

Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space

As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is introduced to seek to mitigate these risks by removing the influence of undesirable data. However, existing methods not only rely on the retained dataset to preserve model utility, but also suffer from cumulative catastrophic utility loss under continuous unlearning requests. To solve this dilemma, we propose a novel method, called Rotation Control Unlearning (RCU), which leverages the rotational salience weight of RCU to quantify and control the unlearning degree in the continuous unlearning process. The skew symmetric loss is designed to construct the existence of the cognitive rotation space, where the changes of rotational angle can simulate the continuous unlearning process. Furthermore, we design an orthogonal rotation axes regularization to enforce mutually perpendicular rotation directions for continuous unlearning requests, effectively minimizing interference and addressing cumulative catastrophic utility loss. Experiments on multiple datasets confirm that our method without retained dataset achieves SOTA performance.

preprint2026arXiv

TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning

Multivariate Time-Series (MTS) clustering is crucial for signal processing and data analysis. Although deep learning approaches, particularly those leveraging Contrastive Learning (CL), are prominent for MTS representation, existing CL-based models face two key limitations: 1) neglecting clustering information during positive/negative sample pair construction, and 2) introducing unreasonable inductive biases, e.g., destroying time dependence and periodicity through augmentation strategies, compromising representation quality. This paper, therefore, proposes a Temporal-Frequency Enhanced Contrastive (TFEC) learning framework. To preserve temporal structure while generating low-distortion representations, a temporal-frequency Co-EnHancement (CoEH) mechanism is introduced. Accordingly, a synergistic dual-path representation and cluster distribution learning framework is designed to jointly optimize cluster structure and representation fidelity. Experiments on six real-world benchmark datasets demonstrate TFEC's superiority, achieving 4.48% average NMI gains over SOTA methods, with ablation studies validating the design. The code of the paper is available at: https://github.com/yueliangy/TFEC.

preprint2026arXiv

UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis

With the recent surge of generative models, diffusion-based approaches have become mainstream for view synthesis tasks, either in an explicit depth-warp-inpaint or in an implicit end-to-end manner. Despite their success, both paradigms often suffer from noticeable quality degradation, e.g., blurred details and distorted structures, caused by pixel-to-latent compression and diffusion hallucination. In this paper, we investigate diffusion degradation from three key dimensions (i.e., spatial, temporal, and backbone-related) and propose UniFixer, a universal reference-guided framework that fixes diverse degradation artifacts via a coarse-to-fine strategy. Specifically, a reference pre-alignment module is first designed to perform coarse alignment between the reference view and the degraded novel view. A global structure anchoring mechanism then rectifies geometric distortions to ensure structural fidelity, followed by a local detail injection module that recovers fine-grained texture details for high-quality view synthesis. Our UniFixer serves as a plug-and-play refiner that achieves zero-shot fixing across different types of diffusion degradation, and extensive experiments verify our state-of-the-art performance on novel view synthesis and stereo conversion.

preprint2026arXiv

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the structural dependencies, domain-specific vulnerability knowledge, and complex program semantics required for accurate detection. Recent Large Language Models (LLMs) have shown strong code understanding ability, but directly prompting them with raw source code often leads to missed vulnerabilities or false alarms, especially when vulnerable and benign functions differ only in subtle semantic details. To address this, we propose VulTriage, a triple-path context augmentation framework for LLM-based vulnerability detection. VulTriage enhances the LLM input through three complementary paths: a Control Path that extracts and verbalizes AST, CFG, and DFG information to expose control and data dependencies; a Knowledge Path that retrieves relevant CWE-derived vulnerability patterns and examples through hybrid dense--sparse retrieval; and a Semantic Path that summarizes the functional behavior of the code before the final judgment. These contexts are integrated into a unified instruction to guide the LLM toward more reliable vulnerability reasoning. Experiments on the PrimeVul pair test set show that VulTriage achieves state-of-the-art performance, outperforming existing deep learning and LLM-based baselines on key pair-wise and classification metrics. Further ablation studies verify the effectiveness of each path, and additional experiments on the Kotlin dataset demonstrate the generalization ability of VulTriage under low-resource and class-imbalanced settings. Our code is available at https://github.com/vinsontang1/VulTriage

preprint2026arXiv

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments that bias subsequent generations. We introduce Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation. SxS interleaves partial disclosures with continued private reasoning in the same context, but releases content only when it is supported by the reasoning so far. To learn such pacing without incentivizing filler, we construct entailment-aligned interleaved trajectories by matching answer prefixes to supporting reasoning prefixes, then train with SFT to acquire the dual-action semantics and RL to recover reasoning performance under the new format. Across two Qwen3 architectures/scales (MoE Qwen3-30B-A3B, dense Qwen3-4B) and both in-domain (AIME25) and out-of-domain (GPQA-Diamond) benchmarks, SxS improves accuracy--content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.

preprint2023arXiv

A portable sub Hertz ultra-stable laser over 1700km highway transportation

We present a subHz linewidth portable ultrastable laser with the mass and volume of are 40kg and 400mm*280mm*450mm, respectively, that meets the requirements of automatic frequency locking and road transportation. A dynamic analytical model of the physical parts of ultrastable laser is established, and the first order resonance frequency is determined by FEA and well agrees with the experimentally measured result. To verify the transport performance of the portable ultrastable laser, it is tested for 100 km actual road transportation and 60 min continuous vibration, corresponding to 1700 km road transportation. The success of the test demonstrated that the portable ultrastable laser was very robust. Meanwhile, the portable ultrastable lasers shows that the median of the linewidth distribution is approximately 0.78 Hz, and the fractional frequency instability is less than 3E-15 at 1 to 10 s averaging time. This value approaches the total noise of 2.0E-15 including thermal noise and residual amplitude modulation. The robust suggested that the portable ultrastable laser might be a good candidate such as optical frequency transfer and metrological systems.

preprint2022arXiv

An EEG-Based Multi-Modal Emotion Database with Both Posed and Authentic Facial Actions for Emotion Analysis

Emotion is an experience associated with a particular pattern of physiological activity along with different physiological, behavioral and cognitive changes. One behavioral change is facial expression, which has been studied extensively over the past few decades. Facial behavior varies with a person's emotion according to differences in terms of culture, personality, age, context, and environment. In recent years, physiological activities have been used to study emotional responses. A typical signal is the electroencephalogram (EEG), which measures brain activity. Most of existing EEG-based emotion analysis has overlooked the role of facial expression changes. There exits little research on the relationship between facial behavior and brain signals due to the lack of dataset measuring both EEG and facial action signals simultaneously. To address this problem, we propose to develop a new database by collecting facial expressions, action units, and EEGs simultaneously. We recorded the EEGs and face videos of both posed facial actions and spontaneous expressions from 29 participants with different ages, genders, ethnic backgrounds. Differing from existing approaches, we designed a protocol to capture the EEG signals by evoking participants' individual action units explicitly. We also investigated the relation between the EEG signals and facial action units. As a baseline, the database has been evaluated through the experiments on both posed and spontaneous emotion recognition with images alone, EEG alone, and EEG fused with images, respectively. The database will be released to the research community to advance the state of the art for automatic emotion recognition.

preprint2022arXiv

AutoGCL: Automated Graph Contrastive Learning via Learnable View Generators

Contrastive learning has been widely applied to graph representation learning, where the view generators play a vital role in generating effective contrastive samples. Most of the existing contrastive learning methods employ pre-defined view generation methods, e.g., node drop or edge perturbation, which usually cannot adapt to input data or preserve the original semantic structures well. To address this issue, we propose a novel framework named Automated Graph Contrastive Learning (AutoGCL) in this paper. Specifically, AutoGCL employs a set of learnable graph view generators orchestrated by an auto augmentation strategy, where every graph view generator learns a probability distribution of graphs conditioned by the input. While the graph view generators in AutoGCL preserve the most representative structures of the original graph in generation of every contrastive sample, the auto augmentation learns policies to introduce adequate augmentation variances in the whole contrastive learning procedure. Furthermore, AutoGCL adopts a joint training strategy to train the learnable view generators, the graph encoder, and the classifier in an end-to-end manner, resulting in topological heterogeneity yet semantic similarity in the generation of contrastive samples. Extensive experiments on semi-supervised learning, unsupervised learning, and transfer learning demonstrate the superiority of our AutoGCL framework over the state-of-the-arts in graph contrastive learning. In addition, the visualization results further confirm that the learnable view generators can deliver more compact and semantically meaningful contrastive samples compared against the existing view generation methods.

preprint2022arXiv

BM-NAS: Bilevel Multimodal Neural Architecture Search

Deep neural networks (DNNs) have shown superior performances on various multimodal learning problems. However, it often requires huge efforts to adapt DNNs to individual multimodal tasks by manually engineering unimodal features and designing multimodal feature fusion strategies. This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework, which makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme. At the upper level, BM-NAS selects the inter/intra-modal feature pairs from the pretrained unimodal backbones. At the lower level, BM-NAS learns the fusion strategy for each feature pair, which is a combination of predefined primitive operations. The primitive operations are elaborately designed and they can be flexibly combined to accommodate various effective feature fusion modules such as multi-head attention (Transformer) and Attention on Attention (AoA). Experimental results on three multimodal tasks demonstrate the effectiveness and efficiency of the proposed BM-NAS framework. BM-NAS achieves competitive performances with much less search time and fewer model parameters in comparison with the existing generalized multimodal NAS methods.

preprint2022arXiv

Experimental Test of Contextuality based on State Discrimination with a Single Qubit

Exploring quantum phenomena beyond predictions of any classical model has fundamental importance to understand the boundary of classical and quantum descriptions of nature. As a typical property that a quantum system behaves distinctively from a classical counterpart, contextuality has been studied extensively and verified experimentally in systems composed of at least three levels (qutrit). Here we extend the scope of experimental test of contextuality to a minimal quantum system of only two states (qubit) by implementing the minimum error state discrimination on a single $^{171}$Yb$^+$ ion. We observe a substantial violation of a no-go inequality derived by assuming non-contextuality, and firmly conclude that the measured results of state discrimination cannot be reconciled with any non-contextual description. We also quantify the contextual advantage of state discrimination and the tolerance against quantum noises.

preprint2022arXiv

Exploring Edge Disentanglement for Node Classification

Edges in real-world graphs are typically formed by a variety of factors and carry diverse relation semantics. For example, connections in a social network could indicate friendship, being colleagues, or living in the same neighborhood. However, these latent factors are usually concealed behind mere edge existence due to the data collection and graph formation processes. Despite rapid developments in graph learning over these years, most models take a holistic approach and treat all edges as equal. One major difficulty in disentangling edges is the lack of explicit supervisions. In this work, with close examination of edge patterns, we propose three heuristics and design three corresponding pretext tasks to guide the automatic edge disentanglement. Concretely, these self-supervision tasks are enforced on a designed edge disentanglement module to be trained jointly with the downstream node classification task to encourage automatic edge disentanglement. Channels of the disentanglement module are expected to capture distinguishable relations and neighborhood interactions, and outputs from them are aggregated as node representations. The proposed DisGNN is easy to be incorporated with various neural architectures, and we conduct experiments on $6$ real-world datasets. Empirical results show that it can achieve significant performance gains.

preprint2022arXiv

GaLactic and Extragalactic All-sky Murchison Widefield Array survey eXtended (GLEAM-X) I: Survey Description and Initial Data Release

We describe a new low-frequency wideband radio survey of the southern sky. Observations covering 72 - 231 MHz and Declinations south of $+30^\circ$ have been performed with the Murchison Widefield Array "extended" Phase II configuration over 2018 - 2020 and will be processed to form data products including continuum and polarisation images and mosaics, multi-frequency catalogues, transient search data, and ionospheric measurements. From a pilot field described in this work, we publish an initial data release covering 1,447 sq. deg over 4h < RA < 13h, -32.7deg < Dec < -20.7deg. We process twenty frequency bands sampling 72 - 231 MHz, with a resolution of $2'$ - $45"$, and produce a wideband source-finding image across 170 - 231MHz with a root-mean-square noise of $1.27\pm0.15$ mJy/beam. Source-finding yields 78,967 components, of which 71,320 are fitted spectrally. The catalogue has a completeness of 98% at $\sim50$mJy, and a reliability of 98.2% at $5σ$ rising to 99.7% at $7σ$. A catalogue is available from Vizier; images are made available on AAO Data Central, SkyView, and the PASA Datastore. This is the first in a series of data releases from the GLEAM-X survey.

preprint2022arXiv

Graph-Guided Network for Irregularly Sampled Multivariate Time Series

In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors) observed at different time points. Here, we introduce RAINDROP, a graph neural network that embeds irregularly sampled and multivariate time series while also learning the dynamics of sensors purely from observational data. RAINDROP represents every sample as a separate sensor graph and models time-varying dependencies between sensors with a novel message passing operator. It estimates the latent sensor graph structure and leverages the structure together with nearby observations to predict misaligned readouts. This model can be interpreted as a graph neural network that sends messages over graphs that are optimized for capturing time-varying dependencies among sensors. We use RAINDROP to classify time series and interpret temporal dynamics on three healthcare and human activity datasets. RAINDROP outperforms state-of-the-art methods by up to 11.4% (absolute F1-score points), including techniques that deal with irregular sampling using fixed discretization and set functions. RAINDROP shows superiority in diverse setups, including challenging leave-sensor-out settings.

preprint2022arXiv

High throughput data-driven design of laser crystallized 2D MoS2 chemical sensors

High throughput characterization and processing techniques are becoming increasingly necessary to navigate multivariable, data-driven design challenges for sensors and electronic devices. For two-dimensional materials, device performance is highly dependent upon a vast array of material properties including number of layers, lattice strain, carrier concentration, defect density, and grain structure. In this work, laser-crystallization was used to locally pattern and transform hundreds of regions of amorphous MoS2 thin films into 2D 2H-MoS2. A high throughput Raman spectroscopy approach was subsequently used to assess the process-dependent structural and compositional variations for each illuminated region, yielding over 5500 distinct non-resonant, resonant, and polarized Raman spectra. The rapid generation of a comprehensive library of structural and compositional data elucidated important trends between structure-property-processing relationships involving laser-crystallized MoS2, including the relationships between grain size, grain orientation, and intrinsic strain. Moreover, extensive analysis of structure/property relationships allowed for intelligent design, and evaluation of major contributions to, device performance in MoS2 chemical sensors. In particular, it is found that sensor performance is strongly dependent on the orientation of the MoS2 grains relative to the crystal plane.

preprint2022arXiv

Improved Sensitivity for Space Domain Awareness Observations with the Murchison Widefield Array

Our previously reported survey of the Low Earth Orbit (LEO) environment using the Murchison Widefield Array (MWA) detected over 70 unique Resident Space Objects (RSOs) over multiple passes, from 20 hours of observations in passive radar mode. In this paper, we extend this work by demonstrating two methods that improve the detection sensitivity of the system. The first method, called shift-stacking, increases the statistical significance of faint RSO signals through the spatially coherent integration of the reflected signal along the RSO's trajectory across the sky. This method was tested on the observations used during our previous blind survey, and we obtained a $75\%$ increase in the total number of detections. The second method re-focuses the MWA to the near-field RSO's position (post-observation), by applying a complex phase correction to each visibility to account for the curved wave-front. The method was tested successfully on an MWA extended array observation of an ISS pass. However, the method is currently limited by signal de-coherence on the long-baselines (due to the hardware constraints of the current correlator). We discuss the sensitivity improvement for RSO detections we expect from the MWA Phase 3 correlator upgrade. We conclude the paper by briefly commenting on future dedicated Space Domain Awareness (SDA) systems that will incorporate MWA technologies.

preprint2022arXiv

Knowledge-Spreader: Learning Facial Action Unit Dynamics with Extremely Limited Labels

Recent studies on the automatic detection of facial action unit (AU) have extensively relied on large-sized annotations. However, manually AU labeling is difficult, time-consuming, and costly. Most existing semi-supervised works ignore the informative cues from the temporal domain, and are highly dependent on densely annotated videos, making the learning process less efficient. To alleviate these problems, we propose a deep semi-supervised framework Knowledge-Spreader (KS), which differs from conventional methods in two aspects. First, rather than only encoding human knowledge as constraints, KS also learns the Spatial-Temporal AU correlation knowledge in order to strengthen its out-of-distribution generalization ability. Second, we approach KS by applying consistency regularization and pseudo-labeling in multiple student networks alternately and dynamically. It spreads the spatial knowledge from labeled frames to unlabeled data, and completes the temporal information of partially labeled video clips. Thus, the design allows KS to learn AU dynamics from video clips with only one label allocated, which significantly reduce the requirements of using annotations. Extensive experiments demonstrate that the proposed KS achieves competitive performance as compared to the state of the arts under the circumstances of using only 2% labels on BP4D and 5% labels on DISFA. In addition, we test it on our newly developed large-scale comprehensive emotion database, which contains considerable samples across well-synchronized and aligned sensor modalities for easing the scarcity issue of annotations and identities in human affective computing. The new database will be released to the research community.

preprint2022arXiv

Link Prediction on Heterophilic Graphs via Disentangled Representation Learning

Link prediction is an important task that has wide applications in various domains. However, the majority of existing link prediction approaches assume the given graph follows homophily assumption, and designs similarity-based heuristics or representation learning approaches to predict links. However, many real-world graphs are heterophilic graphs, where the homophily assumption does not hold, which challenges existing link prediction methods. Generally, in heterophilic graphs, there are many latent factors causing the link formation, and two linked nodes tend to be similar in one or two factors but might be dissimilar in other factors, leading to low overall similarity. Thus, one way is to learn disentangled representation for each node with each vector capturing the latent representation of a node on one factor, which paves a way to model the link formation in heterophilic graphs, resulting in better node representation learning and link prediction performance. However, the work on this is rather limited. Therefore, in this paper, we study a novel problem of exploring disentangled representation learning for link prediction on heterophilic graphs. We propose a novel framework DisenLink which can learn disentangled representations by modeling the link formation and perform factor-aware message-passing to facilitate link prediction. Extensive experiments on 13 real-world datasets demonstrate the effectiveness of DisenLink for link prediction on both heterophilic and hemophiliac graphs. Our codes are available at https://github.com/sjz5202/DisenLink

preprint2022arXiv

Multiple-Photon Resonance Enabled Quantum Interference in Emission Spectroscopy of N_2^+

Quantum interference occurs frequently in the interaction of laser radiation with materials, leading to a series of fascinating effects such as lasing without inversion, electromagnetically induced transparency, Fano resonance, etc. Such quantum interference effects are mostly enabled by single-photon resonance with transitions in the matter, regardless of how many optical frequencies are involved. Here, we demonstrate quantum interference driven by multiple photons in the emission spectroscopy of nitrogen ions that are resonantly pumped by ultrafast infrared laser pulses. In the spectral domain, Fano resonance is observed in the emission spectrum, where a laser-assisted dynamic Stark effect creates the continuum. In the time domain, the fast-evolving emission is measured, revealing the nature of free-induction decay (FID) arising from quantum radiation and molecular cooperativity. These findings clarify the mechanism of coherent emission of nitrogen ions pumped with MIR pump laser and are likely to be universal. The present work opens a route to explore the important role of quantum interference during the interaction of intense laser pulses with materials near multiple photon resonance.

preprint2022arXiv

New criterions on nonexistence of periodic orbits of planar dynamical systems and their applications

Characterizing existence or not of periodic orbit is a classical problem and it has both theoretical importance and many real applications. Here, several new criterions on nonexistence of periodic orbits of the planar dynamical system $\dot x=y,~\dot y=-g(x)-f(x,y)y$ are obtained in this paper, and by examples showing that these criterions are applicable, but the known ones are invalid to them. Based on these criterions, we further characterize the local topological structures of its equilibrium, which also show that one of the classical results by A.F. Andreev [Amer. Math. Soc. Transl. 8 (1958), 183--207] on local topological classification of the degenerate equilibrium is incomplete. Finally, as another application of these results, we classify the global phase portraits of a planar differential system, which comes from the third question in the list of the 33 questions posed by A. Gasull and also from a mechanical oscillator under suitable restriction to its parameters.

preprint2022arXiv

Offline-Online Learning of Deformation Model for Cable Manipulation with Graph Neural Networks

Manipulating deformable linear objects by robots has a wide range of applications, e.g., manufacturing and medical surgery. To complete such tasks, an accurate dynamics model for predicting the deformation is critical for robust control. In this work, we deal with this challenge by proposing a hybrid offline-online method to learn the dynamics of cables in a robust and data-efficient manner. In the offline phase, we adopt Graph Neural Network (GNN) to learn the deformation dynamics purely from the simulation data. Then a linear residual model is learned in real-time to bridge the sim-to-real gap. The learned model is then utilized as the dynamics constraint of a trust region based Model Predictive Controller (MPC) to calculate the optimal robot movements. The online learning and MPC run in a closed-loop manner to robustly accomplish the task. Finally, comparative results with existing methods are provided to quantitatively show the effectiveness and robustness.

preprint2022arXiv

Omni-directional Pathloss Measurement Based on Virtual Antenna Array with Directional Antennas

Omni-directional pathloss, which refers to the pathloss when omni-directional antennas are used at the link ends, are essential for system design and evaluation. In the millimeter-wave (mm-Wave) and beyond bands, high gain directional antennas are widely used for channel measurements due to the significant signal attenuation. Conventional methods for omni-directional pathloss estimation are based on directional scanning sounding (DSS) system, i.e., a single directional antenna placed at the center of a rotator capturing signals from different rotation angles. The omni-directional pathloss is obtained by either summing up all the powers above the noise level or just summing up the powers of detected propagation paths. However, both methods are problematic with relatively wide main beams and high side-lobes provided by the directional antennas. In this letter, directional antenna based virtual antenna array (VAA) system is implemented for omni-directional pathloss estimation. The VAA scheme uses the same measurement system as the DSS, yet it offers high angular resolution (i.e. narrow main beam) and low side-lobes, which is essential for achieving accurate multipath detection in the power angular delay profiles (PADPs) and thereby obtaining accurate omni-directional pathloss. A measurement campaign was designed and conducted in an indoor corridor at 28-30 GHz to verify the effectiveness of the proposed method.

preprint2022arXiv

On the Equity of Nuclear Norm Maximization in Unsupervised Domain Adaptation

Nuclear norm maximization has shown the power to enhance the transferability of unsupervised domain adaptation model (UDA) in an empirical scheme. In this paper, we identify a new property termed equity, which indicates the balance degree of predicted classes, to demystify the efficacy of nuclear norm maximization for UDA theoretically. With this in mind, we offer a new discriminability-and-equity maximization paradigm built on squares loss, such that predictions are equalized explicitly. To verify its feasibility and flexibility, two new losses termed Class Weighted Squares Maximization (CWSM) and Normalized Squares Maximization (NSM), are proposed to maximize both predictive discriminability and equity, from the class level and the sample level, respectively. Importantly, we theoretically relate these two novel losses (i.e., CWSM and NSM) to the equity maximization under mild conditions, and empirically suggest the importance of the predictive equity in UDA. Moreover, it is very efficient to realize the equity constraints in both losses. Experiments of cross-domain image classification on three popular benchmark datasets show that both CWSM and NSM contribute to outperforming the corresponding counterparts.

preprint2022arXiv

Online Graph Learning in Dynamic Environments

Inferring the underlying graph topology that characterizes structured data is pivotal to many graph-based models when pre-defined graphs are not available. This paper focuses on learning graphs in the case of sequential data in dynamic environments. For sequential data, we develop an online version of classic batch graph learning method. To better track graphs in dynamic environments, we assume graphs evolve in certain patterns such that dynamic priors might be embedded in the online graph learning framework. When the information of these hidden patterns is not available, we use history data to predict the evolution of graphs. Furthermore, dynamic regret analysis of the proposed method is performed and illustrates that our online graph learning algorithms can reach sublinear dynamic regret. Experimental results support the fact that our method is superior to the state-of-art methods.

preprint2022arXiv

Properties and device performance of BN thin films grown on GaN by pulsed laser deposition

Wide and ultrawide-bandgap semiconductors lie at the heart of next-generation high-power, high-frequency electronics. Here, we report the growth of ultrawide-bandgap boron nitride (BN) thin films on wide-bandgap gallium nitride (GaN) by pulsed laser deposition. Comprehensive spectroscopic (core level and valence band XPS, FTIR, Raman) and microscopic (AFM and STEM) characterizations confirm the growth of BN thin films on GaN. Optically, we observed that BN/GaN heterostructure is second-harmonic generation active. Moreover, we fabricated the BN/GaN heterostructure-based Schottky diode that demonstrates rectifying characteristics, lower turn-on voltage, and an improved breakdown capability (234 V) as compared to GaN (168 V), owing to the higher breakdown electrical field of BN. Our approach is an early step towards bridging the gap between wide and ultrawide-bandgap materials for potential optoelectronics as well as next-generation high-power electronics.

preprint2022arXiv

Stability of Oxygenated Groups on Pristine and Defective Diamond Surfaces

The surface functionalization of diamond has been extensively studied through a variety of techniques, such as oxidation. Several oxygen groups have been correspondingly detected on the oxidized diamond, such as COC (ester), CO (ketonic), and COH (hydroxyl). However, the composition and relative concentration of these groups on diamond surfaces can be affected by the type of oxygenation treatment and the diamond surface quality. To investigate the stability of the oxygenated groups at specific diamond surfaces, we evaluated through fully atomistic reactive molecular mechanics (FARMM) simulations, using the ReaxFF force field, the formation energies of CO, COC, and COH groups on pristine and defective diamond surfaces (110), (111), and (311). According to our findings, the COH group has the lowest formation energy on a perfect (110) surface, while the COC is favored on a defective surface. As for the (111) surface, the COC group is the most stable for both pristine and defective surfaces. Similarly, COC group is also the most stable one on the defective/perfect (311) surface. In this way, our results suggest that if in a diamond film the (110) surface is the major exposed facet, the most adsorbed oxygen group could be either COH or COC, in which the COC would depend on the level of surface defects.

preprint2022arXiv

Text Spotting Transformers

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild. TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition. Other than most existing literature, our method is free from Region-of-Interest operations and heuristics-driven post-processing procedures; TESTR is particularly effective when dealing with curved text-boxes where special cares are needed for the adaptation of the traditional bounding-box representations. We show our canonical representation of control points suitable for text instances in both Bezier curve and polygon annotations. In addition, we design a bounding-box guided polygon detection (box-to-polygon) process. Experiments on curved and arbitrarily shaped datasets demonstrate state-of-the-art performances of the proposed TESTR algorithm.

preprint2022arXiv

Time-varying Graph Learning Under Structured Temporal Priors

This paper endeavors to learn time-varying graphs by using structured temporal priors that assume underlying relations between arbitrary two graphs in the graph sequence. Different from many existing chain structure based methods in which the priors like temporal homogeneity can only describe the variations of two consecutive graphs, we propose a structure named \emph{temporal graph} to characterize the underlying real temporal relations. Under this framework, the chain structure is actually a special case of our temporal graph. We further proposed Alternating Direction Method of Multipliers (ADMM), a distributed algorithm, to solve the induced optimization problem. Numerical experiments demonstrate the superiorities of our method.

preprint2022arXiv

Unifying Motion Deblurring and Frame Interpolation with Events

Slow shutter speed and long exposure time of frame-based cameras often cause visual blur and loss of inter-frame information, degenerating the overall quality of captured videos. To this end, we present a unified framework of event-based motion deblurring and frame interpolation for blurry video enhancement, where the extremely low latency of events is leveraged to alleviate motion blur and facilitate intermediate frame prediction. Specifically, the mapping relation between blurry frames and sharp latent images is first predicted by a learnable double integral network, and a fusion network is then proposed to refine the coarse results via utilizing the information from consecutive blurry inputs and the concurrent events. By exploring the mutual constraints among blurry frames, latent images, and event streams, we further propose a self-supervised learning framework to enable network training with real-world blurry videos and events. Extensive experiments demonstrate that our method compares favorably against the state-of-the-art approaches and achieves remarkable performance on both synthetic and real-world datasets.

preprint2021arXiv

A New Design of Cache-aided Multiuser Private Information Retrieval with Uncoded Prefetching

In the problem of cache-aided multiuser private information retrieval (MuPIR), a set of $K_{\rm u}$ cache-equipped users wish to privately download a set of messages from $N$ distributed databases each holding a library of $K$ messages. The system works in two phases: {\it cache placement (prefetching) phase} in which the users fill up their cache memory, and {\it private delivery phase} in which the users' demands are revealed and they download an answer from each database so that the their desired messages can be recovered while each individual database learns nothing about the identities of the requested messages. The goal is to design the placement and the private delivery phases such that the \emph{load}, which is defined as the total number of downloaded bits normalized by the message size, is minimized given any user memory size. This paper considers the MuPIR problem with two messages, arbitrary number of users and databases where uncoded prefetching is assumed, i.e., the users directly copy some bits from the library as their cached contents. We propose a novel MuPIR scheme inspired by the Maddah-Ali and Niesen (MAN) coded caching scheme. The proposed scheme achieves lower load than any existing schemes, especially the product design (PD), and is shown to be optimal within a factor of $8$ in general and exactly optimal at very high or low memory regime.

preprint2021arXiv

A Reactive Molecular Dynamics Study of Hydrogenation on Diamond Surfaces

Hydrogenated diamond has been regarded as a promising material in electronic device applications, especially in field-effect transistors (FETs). However, the quality of diamond hydrogenation has not yet been established, nor has the specific orientation that would provide the optimum hydrogen coverage. In addition, most theoretical work in the literature use models with 100% hydrogenated diamond surfaces to study electronic properties, which is far from the experimentally observed hydrogen coverage. In this work, we have carried out a detailed study using fully atomistic reactive molecular dynamics (MD) simulations on low indices diamond surfaces i.e. (001), (013), (110), (113) and (111) to evaluate the quality and hydrogenation thresholds on different diamond surfaces and their possible effects on electronic properties. Our simulation results indicate that the 100% surface hydrogenation in these surfaces is hard to achieve because of the steric repulsion between the terminated hydrogen atoms. Among all the considered surfaces, the (001), (110), and (113) surfaces incorporate a larger number of hydrogen atoms and passivate the surface dangling bonds. Our results on hydrogen stability also suggest that these surfaces with optimum hydrogen coverage are robust under extreme conditions and could provide homogeneous p-type surface conductivity in the diamond surfaces, a key requirement for high-field, high-frequency device applications.

preprint2021arXiv

A Two-stream Neural Network for Pose-based Hand Gesture Recognition

Pose based hand gesture recognition has been widely studied in the recent years. Compared with full body action recognition, hand gesture involves joints that are more spatially closely distributed with stronger collaboration. This nature requires a different approach from action recognition to capturing the complex spatial features. Many gesture categories, such as "Grab" and "Pinch", have very similar motion or temporal patterns posing a challenge on temporal processing. To address these challenges, this paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN) extracting the short-term temporal information and hierarchical spatial information, and the other being a residual-connection enhanced bidirectional Independently Recurrent Neural Network (RBi-IndRNN) for extracting long-term temporal information. The self-attention based graph convolutional network has a dynamic self-attention mechanism to adaptively exploit the relationships of all hand joints in addition to the fixed topology and local feature extraction in the GCN. On the other hand, the residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling. The two streams are fused together for recognition. The Dynamic Hand Gesture dataset and First-Person Hand Action dataset are used to validate its effectiveness, and our method achieves state-of-the-art performance.

preprint2021arXiv

Feedback-based Digital Higher-order Terminal Sliding Mode for 6-DOF Industrial Manipulators

The precise motion control of a multi-degree of freedom~(DOF) robot manipulator is always challenging due to its nonlinear dynamics, disturbances, and uncertainties. Because most manipulators are controlled by digital signals, a novel higher-order sliding mode controller in the discrete-time form with time delay estimation is proposed in this paper. The dynamic model of the manipulator used in the design allows proper handling of nonlinearities, uncertainties and disturbances involved in the problem. Specifically, parametric uncertainties and disturbances are handled by the time delay estimation and the nonlinearity of the manipulator is addressed by the feedback structure of the controller. The combination of terminal sliding mode surface and higher-order control scheme in the controller guarantees a fast response with a small chattering amplitude. Moreover, the controller is designed with a modified sliding mode surface and variable-gain structure, so that the performance of the controller is further enhanced. We also analyze the condition to guarantee the stability of the closed-loop system in this paper. Finally, the simulation and experimental results prove that the proposed control scheme has a precise performance in a robot manipulator system.

preprint2021arXiv

Information retrieval and eigenstates coalescence in a non-Hermitian quantum system with anti-$\mathcal{PT}$ symmetry

Non-Hermitian systems with parity-time reversal ($\mathcal{PT}$) or anti-$\mathcal{PT}$ symmetry have attracted a wide range of interest owing to their unique characteristics and counterintuitive phenomena. One of the most extraordinary features is the presence of an exception point (EP), across which a phase transition with spontaneously broken $\mathcal{PT}$ symmetry takes place. We implement a Floquet Hamiltonian of a single qubit with anti-$\mathcal{PT}$ symmetry by periodically driving a dissipative quantum system of a single trapped ion. With stroboscopic emission and quantum state tomography, we obtain the time evolution of density matrix for an arbitrary initial state, and directly demonstrate information retrieval, eigenstates coalescence, and topological energy spectra as unique features of non-Hermitian systems.

preprint2021arXiv

Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks

Many manipulation tasks require robots to interact with unknown environments. In such applications, the ability to adapt the impedance according to different task phases and environment constraints is crucial for safety and performance. Although many approaches based on deep reinforcement learning (RL) and learning from demonstration (LfD) have been proposed to obtain variable impedance skills on contact-rich manipulation tasks, these skills are typically task-specific and could be sensitive to changes in task settings. This paper proposes an inverse reinforcement learning (IRL) based approach to recover both the variable impedance policy and reward function from expert demonstrations. We explore different action space of the reward functions to achieve a more general representation of expert variable impedance skills. Experiments on two variable impedance tasks (Peg-in-Hole and Cup-on-Plate) were conducted in both simulations and on a real FANUC LR Mate 200iD/7L industrial robot. The comparison results with behavior cloning and force-based IRL proved that the learned reward function in the gain action space has better transferability than in the force space. Experiment videos are available at https://msc.berkeley.edu/research/impedance-irl.html.

preprint2021arXiv

Uncoordinated Spectrum Sharing in Millimeter Wave Networks Using Carrier Sensing

We propose using Carrier Sensing (CS) for distributed interference management in millimeter-wave (mmWave) cellular networks where spectrum is shared by multiple operators that do not coordinate among themselves. In addition, even the base station sites can be shared by the operators. We describe important challenges in using traditional CS in this setting and propose enhanced CS protocols to address these challenges. Using stochastic geometry, we develop a general framework for downlink coverage probability analysis of our shared mmWave network in the presence of CS and derive the downlink coverage probability expressions for several CS protocols. To the best of our knowledge, our work is the first to investigate and analyze (using stochastic geometry) CS for mmWave networks with spectrum and BS sites shared among non-coordinating operators. We evaluate the downlink coverage probability of our shared mmWave network using simulations as well as numerical examples based on our analysis. Our evaluations show that our proposed enhancements lead to an improvement in downlink coverage probability, compared to the downlink coverage probability with no CS, for higher values of signal-to-interference and noise ratio (SINR). Interestingly, our evaluations also reveal that for lower values of SINR, not using any CS is the best strategy in terms of the downlink coverage probability.

Xiang Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

140 published item(s)

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

Cross-Subject Generalization for EEG Decoding: A Survey of Deep Learning Methods

DexH2R: Task-oriented Dexterous Manipulation from Human to Robots

DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models

FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models

Hierarchical Secure Aggregation with Heterogeneous Security Constraints and Arbitrary User Collusion

Optimal Communication and Key Rate Region for Hierarchical Secure Aggregation with User Collusion

Optimal Rate Region for Multi-server Secure Aggregation with User Collusion

Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign

Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space

TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning

UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

A portable sub Hertz ultra-stable laser over 1700km highway transportation

An EEG-Based Multi-Modal Emotion Database with Both Posed and Authentic Facial Actions for Emotion Analysis

AutoGCL: Automated Graph Contrastive Learning via Learnable View Generators

BM-NAS: Bilevel Multimodal Neural Architecture Search

Experimental Test of Contextuality based on State Discrimination with a Single Qubit

Exploring Edge Disentanglement for Node Classification

GaLactic and Extragalactic All-sky Murchison Widefield Array survey eXtended (GLEAM-X) I: Survey Description and Initial Data Release

Graph-Guided Network for Irregularly Sampled Multivariate Time Series

High throughput data-driven design of laser crystallized 2D MoS2 chemical sensors

Improved Sensitivity for Space Domain Awareness Observations with the Murchison Widefield Array

Knowledge-Spreader: Learning Facial Action Unit Dynamics with Extremely Limited Labels

Link Prediction on Heterophilic Graphs via Disentangled Representation Learning

Multiple-Photon Resonance Enabled Quantum Interference in Emission Spectroscopy of N_2^+

New criterions on nonexistence of periodic orbits of planar dynamical systems and their applications

Offline-Online Learning of Deformation Model for Cable Manipulation with Graph Neural Networks

Omni-directional Pathloss Measurement Based on Virtual Antenna Array with Directional Antennas

On the Equity of Nuclear Norm Maximization in Unsupervised Domain Adaptation

Online Graph Learning in Dynamic Environments

Properties and device performance of BN thin films grown on GaN by pulsed laser deposition

Stability of Oxygenated Groups on Pristine and Defective Diamond Surfaces

Text Spotting Transformers

Time-varying Graph Learning Under Structured Temporal Priors

Unifying Motion Deblurring and Frame Interpolation with Events

A New Design of Cache-aided Multiuser Private Information Retrieval with Uncoded Prefetching

A Reactive Molecular Dynamics Study of Hydrogenation on Diamond Surfaces

A Two-stream Neural Network for Pose-based Hand Gesture Recognition

Feedback-based Digital Higher-order Terminal Sliding Mode for 6-DOF Industrial Manipulators

Information retrieval and eigenstates coalescence in a non-Hermitian quantum system with anti-$\mathcal{PT}$ symmetry

Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks

Uncoordinated Spectrum Sharing in Millimeter Wave Networks Using Carrier Sensing

A Multi-view CNN-based Acoustic Classification System for Automatic Animal Species Identification

A New Design Framework on Device-to-Device Coded Caching with Optimal Rate and Significantly Less Subpacketizations

Adversarial Imitation Attack

Adversarial Representation Learning for Robust Patient-Independent Epileptic Seizure Detection

Automatic Image Labelling at Pixel Level

Berry curvature memory through electrically driven stacking transitions

BeSense: Leveraging WiFi Channel Data and Computational Intelligence for Behavior Analysis

Brain2Object: Printing Your Mind from Brain Signals with Spatial Correlation Embedding

Cache-aided Interference Management using Hypercube Combinatorial Cache Design with Reduced Subpacketizations and Order Optimal Sum-Degrees of Freedom

Cache-aided Interference Management Using Hypercube Combinatorial Cache Designs

Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning

Entity Profiling in Knowledge Graphs

Graph Computing based Distributed State Estimation with PMUs

High-performance frequency stabilization of ultraviolet diode lasers by using dichroic atomic vapor spectroscopy and transfer cavity

Improving Spiking Sparse Recovery via Non-Convex Penalties

Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature Redundancy

Linear Model based Geometry Coding for Lidar Acquired Point Clouds

Literature Triage on Genomic Variation Publications by Knowledge-enhanced Multi-channel CNN

Magnitude and Spatial Distribution Control of the Supercurrent in Bi2O2Se-Based Josephson Junction

Multi-task Generative Adversarial Learning on Geometrical Shape Reconstruction from EEG Brain Signals

Precision measurements with cold atoms and trapped ions

ProbaNet: Proposal-balanced Network for Object Detection

Reply to "On the heat transfer across a vacuum gap mediated by Casimir force"

The Development of Non-coherent Passive Radar Techniques for Space Situational Awareness with the Murchison Widefield Array

Transparent Metamaterial Absorber with Broadband RCS Reduction for Solar Arrays

Unidirectional Pumping of Phonons by Magnetization Dynamics