Source author record

Xiaoyu Zhang

Xiaoyu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

30works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Hybrid Framework Combining Autoregression and Common Factors for Matrix Time Series

Matrix-valued time series are ubiquitous in modern economics and finance, yet modeling them requires navigating a trade-off between flexibility and parsimony. We propose the Matrix Autoregressive model with Common Factors (MARCF), a unified framework for high-dimensional matrix time series that bridges the structural gap between the Matrix Autoregression (MAR) and Matrix Factor Model (MFM). While MAR typically assumes distinct predictor and response subspaces and MFM enforces identical ones, MARCF explicitly characterizes the intersection of these subspaces. By decomposing the coefficient matrices into common, predictor-specific, and response-specific components, the framework accommodates distinct input and output structures while exploiting their overlap for dimension reduction. We develop a regularized gradient descent estimator that is scalable for high-dimensional data and can efficiently handle the non-convex parameter space. Theoretical analysis establishes local linear convergence of the algorithm and statistical consistency of the estimator under high-dimensional scaling. The estimation efficiency and interpretability of the proposed methods are demonstrated through simulations and an application to global macroeconomic forecasting.

preprint2026arXiv

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

Vision-language models (VLMs) have shown remarkable ability in aligning visual and textual representations, enabling a wide range of multimodal applications. However, their large-scale training data inevitably raises concerns about privacy, copyright, and undesirable content, creating a strong need for machine unlearning. While existing studies mainly focus on single-shot unlearning, practical VLM deployment often involves sequential removal requests over time, giving rise to continual machine unlearning. In this work, we make the first attempt to study continual unlearning for VLMs and identify three key challenges in this setting: effectiveness in removing target knowledge, fidelity in preserving retained model utility, and persistence in preventing knowledge re-emergence under sequential updates. To address these challenges, we propose CATA, a conflict-averse task arithmetic method that represents each forget request as an unlearning task vector. By maintaining historical task vectors and performing sign-aware conflict-averse aggregation, CATA suppresses conflicting update components that may weaken previous forgetting effects. Extensive experiments under both single-shot and continual settings show that CATA outperforms baselines in terms of forgetting effectiveness, model fidelity, and forgetting persistence.

preprint2026arXiv

From Static Risk to Dynamic Trajectories: Toward World-Model-Inspired Clinical Prediction

Clinical decision-making is a feedback system where risk estimates influence treatment, which in turn changes disease trajectories, and both shape clinicians' measurement practices. Static prediction often fails clinically: models trained on observational care logs conflate disease biology with clinician behavior, particularly under treatment confounder feedback and irregular or informative observation. This Review focuses on intervention-aware disease trajectory modeling in clinical AI--methods estimating patient-specific longitudinal disease evolution and assessing trajectory changes under alternative treatments. We organize the field around six linked components: three decision tasks (factual forecasting, counterfactual estimation, policy evaluation) and three data-generating mechanisms (disease evolution, treatment assignment, observation process) that determine identifiability. We present the first unified framework bridging forecasting, counterfactual trajectories, and policy evaluation across discrete/continuous time, explicitly addressing treatment assignment, time-varying confounding, and observation bias. We synthesize key method families (multistate/joint models, temporal point-process, deep sequence architectures, longitudinal causal inference), map them to relevant components, and align evaluation with claim strength via overlap diagnostics, uncertainty quantification, off-policy robustness, and target-trial validation. This synthesis advances benchmark prediction to decision-grade clinical evidence, enabling treatment-sensitive individualized futures, pre-deployment policy stress-testing, and safer closed-loop learning health systems that adapt/abstain when evidence is insufficient.

preprint2026arXiv

Minimax Optimal Robust Sparse Regression with Heavy-Tailed Designs: A Gradient-Based Approach

We investigate high-dimensional sparse regression when both the noise and the design matrix exhibit heavy-tailed behavior. Standard algorithms typically fail in this regime, as heavy-tailed covariates distort the empirical risk geometry. We propose a unified framework, Robust Iterative Gradient descent with Hard Thresholding (RIGHT), which employs a robust gradient estimator to bypass the need for higher-order moment conditions. Our analysis reveals a fundamental decoupling phenomenon: in linear regression, the estimation error rate is governed by the noise tail index, while the sample complexity required for stability is governed by the design tail index. This implies that while heavy-tailed noise limits precision, heavy-tailed designs primarily raise the sample size barrier for convergence. In contrast, for logistic regression, we show that the bounded gradient naturally robustifies the estimator against heavy-tailed designs, restoring standard parametric rates. We derive matching minimax lower bounds to prove that RIGHT achieves optimal estimation accuracy and sample complexity across these regimes, without requiring sample splitting or the existence of the population risk.

preprint2026arXiv

Reduced-Rank Autoregressive Model for High-Dimensional Multivariate Network Time Series

Multivariate network time series are ubiquitous in modern systems, yet existing network autoregressive models typically treat nodes as scalar processes, ignoring cross-variable spillovers. To capture these complex interactions without the curse of dimensionality, we propose the Reduced-Rank Network Autoregressive (RRNAR) model. Our framework introduces a separable bilinear transition structure that couples the known network topology with a learnable low-rank variable subspace. We estimate the model using a novel Scaled Gradient Descent (ScaledGD) algorithm, explicitly designed to bridge the gap between rigid network scalars and flexible factor components. Theoretically, we establish non-asymptotic error bounds under a novel distance metric. A key finding is a network-induced blessing of dimensionality: for sparse networks, the estimation accuracy for network parameters improves as the network size grows. Applications to traffic and server monitoring networks demonstrate that RRNAR significantly outperforms univariate and unstructured benchmarks by identifying latent cross-channel propagation mechanisms.

preprint2026arXiv

Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning

Music-inspired Automatic Stage Lighting Control (ASLC) has gained increasing attention in recent years due to the substantial time and financial costs associated with hiring and training professional lighting engineers. However, existing methods suffer from several notable limitations: the low interpretability of rule-based approaches, the restriction to single-primary-light control in music-to-color-space methods, and the limited transferability of music-to-controlling-parameter frameworks. To address these gaps, we propose SeqLight, a hierarchical deep learning framework that maps music to multi-light Hue-Saturation-Value (HSV) space. Our approach first customizes SkipBART, an end-to-end single primary light generation model, to predict the full light color distribution for each frame, followed by hybrid Imitation Learning (IL) techniques to derive an effective decomposition strategy that distributes the global color distribution among individual lights. Notably, the light decomposition module can be trained under varying venue-specific lighting configurations using only mixed light data and no professional demonstrations, thereby flexibly adapting across diverse venues. In this stage, we formulate the light decomposition task as a Goal-Conditioned Markov Decision Process (GCMDP), construct an expert demonstration set inspired by Hindsight Experience Replay (HER), and introduce a three-phase IL training pipeline, achieving strong generalization capability. To validate our IL solution for the proposed GCMDP, we conduct a series of quantitative analysis and human study. The code and trained models are provided at https://github.com/RS2002/SeqLight .

preprint2025arXiv

Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

Stage lighting is a vital component in live music performances, shaping an engaging experience for both musicians and audiences. In recent years, Automatic Stage Lighting Control (ASLC) has attracted growing interest due to the high costs of hiring or training professional lighting engineers. However, most existing ASLC solutions only classify music into limited categories and map them to predefined light patterns, resulting in formulaic and monotonous outcomes that lack rationality. To address this gap, this paper presents Skip-BART, an end-to-end model that directly learns from experienced lighting engineers and predict vivid, human-like stage lighting. To the best of our knowledge, this is the first work to conceptualize ASLC as a generative task rather than merely a classification problem. Our method adapts the BART model to take audio music as input and produce light hue and value (intensity) as output, incorporating a novel skip connection mechanism to enhance the relationship between music and light within the frame grid. To address the lack of available datasets, we create the first stage lighting dataset, along with several pre-training and transfer learning techniques to improve model training with limited data. We validate our method through both quantitative analysis and an human evaluation, demonstrating that Skip-BART outperforms conventional rule-based methods across all evaluation metrics and shows only a limited gap compared to real lighting engineers. To support further research, we have made our self-collected dataset, code, and trained model parameters available at https://github.com/RS2002/Skip-BART .

preprint2023arXiv

DREAM: Debugging and Repairing AutoML Pipelines

Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems suffer from bugs. We identify two common and severe bugs in AutoML, performance bug (i.e., searching for the desired model takes an unreasonably long time) and ineffective search bug (i.e., AutoML systems are not able to find an accurate enough model). After analyzing the workflow of AutoML, we observe that existing AutoML systems overlook potential opportunities in search space, search method, and search feedback, which results in performance and ineffective search bugs. Based on our analysis, we design and implement DREAM, an automatic debugging and repairing system for AutoML systems. It monitors the process of AutoML to collect detailed feedback and automatically repairs bugs by expanding search space and leveraging a feedback-driven search strategy. Our evaluation results show that DREAM can effectively and efficiently repair AutoML bugs.

preprint2022arXiv

Entropy-driven order in an array of nanomagnets

Long-range ordering is typically associated with a decrease in entropy. Yet, it can also be driven by increasing entropy in certain special cases. We demonstrate that artificial spin ice arrays of single-domain nanomagnets can be designed to produce entropy-driven order. We focus on the tetris artificial spin ice structure, a highly frustrated array geometry with a zero-point Pauli entropy, which is formed by selectively creating regular vacancies on the canonical square ice lattice. We probe thermally active tetris artificial spin ice both experimentally and through simulations, measuring the magnetic moments of the individual nanomagnets. We find two-dimensional magnetic ordering in one subset of these moments, which we demonstrate to be induced by disorder (i.e., increased entropy) in another subset of the moments. In contrast with other entropy-driven systems, the discrete degrees of freedom in tetris artificial spin ice are binary and are both designable and directly observable at the microscale, and the entropy of the system is precisely calculable in simulations. This example, in which the system's interactions and ground state entropy are well-defined, expands the experimental landscape for the study of entropy-driven ordering.

preprint2022arXiv

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing

Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.

preprint2022arXiv

Nonparametric Quantile Regression for Homogeneity Pursuit in Panel Data Models

Many panel data have the latent subgroup effect on individuals, and it is important to correctly identify these groups since the efficiency of resulting estimators can be improved significantly by pooling the information of individuals within each group. However, the currently assumed parametric and semiparametric relationship between the response and predictors may be misspecified, which leads to a wrong grouping result, and the nonparametric approach hence can be considered to avoid such mistakes. Moreover, the response may depend on predictors in different ways at various quantile levels, and the corresponding grouping structure may also vary. To tackle these problems, this article proposes a nonparametric quantile regression method for homogeneity pursuit in panel data models with individual effects, and a pairwise fused penalty is used to automatically select the number of groups. The asymptotic properties are established, and an ADMM algorithm is also developed. The finite sample performance is evaluated by simulation experiments, and the usefulness of the proposed methodology is further illustrated by an empirical example.

preprint2022arXiv

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh

preprint2022arXiv

On Stability of Two Kinds of Delayed Chemical Reaction Networks

For the networks that are linear conjugate to complex balanced systems, the delayed version may include two classes of networks: one class is still linear conjugate to the delayed complex balanced network, the other is not. In this paper, we prove the existence of the first class of networks, and emphasize the local asymptotic stability relative to a certain defined invariant set. For the second class of systems, we define a special subclass and derive the local asymptotic stability for the subclass. Two examples are provided to illustrate our results.

preprint2022arXiv

Perception-and-Regulation Network for Salient Object Detection

Effective fusion of different types of features is the key to salient object detection. The majority of existing network structure design is based on the subjective experience of scholars and the process of feature fusion does not consider the relationship between the fused features and highest-level features. In this paper, we focus on the feature relationship and propose a novel global attention unit, which we term the "perception- and-regulation" (PR) block, that adaptively regulates the feature fusion process by explicitly modeling interdependencies between features. The perception part uses the structure of fully-connected layers in classification networks to learn the size and shape of objects. The regulation part selectively strengthens and weakens the features to be fused. An imitating eye observation module (IEO) is further employed for improving the global perception ability of the network. The imitation of foveal vision and peripheral vision enables IEO to scrutinize highly detailed objects and to organize the broad spatial scene to better segment objects. Sufficient experiments conducted on SOD datasets demonstrate that the proposed method performs favorably against 22 state-of-the-art methods.

preprint2022arXiv

Squeezed metallic droplet with tunable Kubo gap and charge injection in transition metal dichalcogenides

Shrinking the size of a bulk metal into nanoscale leads to the discreteness of electronic energy levels, the so-called Kubo gap. Renormalization of the electronic properties with a tunable and size-dependent Kubo gap renders fascinating photon emission and electron tunneling. In contrast with usual three-dimensional (3D) metal clusters, here we demonstrate that Kubo gap can be achieved with a two-dimensional (2D) metallic transition metal dichalcogenide (i.e., 1T'-phase MoTe2) nanocluster embedded in a semiconducting polymorph (i.e., 1H-phase MoTe2). Such a 1T'-1H MoTe2 nanodomain resembles a 3D metallic droplet squeezed in a 2D space which shows a strong polarization catastrophe while simultaneously maintains its bond integrity which is absent in traditional delta-gapped 3D clusters. The weak screening of the host 2D MoTe2 leads to photon emission of such pseudo-metallic systems and a ballistic injection of carriers in the 1T'-1H-1T' homojunctions which may find applications in sensors and 2D reconfigurable devices.

preprint2022arXiv

Towards Programming Adaptive Linear Neural Networks Through Chemical Reaction Networks

This paper is concerned with programming adaptive linear neural networks (ALNNs) using chemical reaction networks (CRNs) equipped with mass-action kinetics. Through individually programming the forward propagation and the backpropagation of ALNNs, and also utilizing the permeation walls technique, we construct a powerful CRN possessing the function of ALNNs, especially having the function of automatic computation. We also provide theoretical analysis and a case study to support our construction. The results will have potential implications for the developments of synthetic biology, molecular computer and artificial intelligence.

preprint2022arXiv

VAC-CNN: A Visual Analytics System for Comparative Studies of Deep Convolutional Neural Networks

The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not sufficient for a deeper understanding and comparison of the behaviors of different models. Moreover, most of the existing tools for assessing CNN behaviors only support comparison between two models and lack the flexibility of customizing the analysis tasks according to user needs. This paper presents a visual analytics system, VAC-CNN (Visual Analytics for Comparing CNNs), that supports the in-depth inspection of a single CNN model as well as comparative studies of two or more models. The ability to compare a larger number of (e.g., tens of) models especially distinguishes our system from previous ones. With a carefully designed model visualization and explaining support, VAC-CNN facilitates a highly interactive workflow that promptly presents both quantitative and qualitative information at each analysis stage. We demonstrate VAC-CNN's effectiveness for assisting novice ML practitioners in evaluating and comparing multiple CNN models through two use cases and one preliminary evaluation study using the image classification tasks on the ImageNet dataset.

preprint2021arXiv

ConceptScope: Organizing and Visualizing Knowledge in Documents based on Domain Ontology

Current text visualization techniques typically provide overviews of document content and structure using intrinsic properties such as term frequencies, co-occurrences, and sentence structures. Such visualizations lack conceptual overviews incorporating domain-relevant knowledge, needed when examining documents such as research articles or technical reports. To address this shortcoming, we present ConceptScope, a technique that utilizes a domain ontology to represent the conceptual relationships in a document in the form of a Bubble Treemap visualization. Multiple coordinated views of document structure and concept hierarchy with text overviews further aid document analysis. ConceptScope facilitates exploration and comparison of single and multiple documents respectively. We demonstrate ConceptScope by visualizing research articles and transcripts of technical presentations in computer science. In a comparative study with DocuBurst, a popular document visualization tool, ConceptScope was found to be more informative in exploring and comparing domain-specific documents, but less so when it came to documents that spanned multiple disciplines.

preprint2021arXiv

Consensus-Based Decentralized Energy Trading for Distributed Energy Resources

In smart grids, distributed energy resources (DERs) have penetrated residential zones to provide a new form of electricity supply, mainly from renewable energy. Residential households and commercial buildings with DERs have become prosumers in the local grids, since they can sell surplus power to others. Researches have been initiated to integrate and utilize DERs through better control and communication strategies. With the advances in the Internet of Things (IoT) technology, unprecedented coordination among DERs can be achieved to facilitate energy trading and transactive energy management. However, preventing leakage of users' information during the optimization process keeps challenging researchers, which drives them to develop privacy-preserving energy management systems. In this paper, we develop a fully decentralized transactive energy management using the consensus-based algorithm. To be specific, we design a virtual pool for prosumers to trade energy and exchange information with IoT technologies' support. The consensus-based algorithm enables prosumers to obtain the optimal energy schedule independently in a coordinated manner without revealing any personal data. We use real-world data to perform simulations and validate our developed algorithm. The results show that our consensus-based decentralized transactive energy management strategy is feasible and can significantly reduce the overall system cost.

preprint2021arXiv

Interaction-aware Kalman Neural Networks for Trajectory Prediction

Forecasting the motion of surrounding obstacles (vehicles, bicycles, pedestrians and etc.) benefits the on-road motion planning for intelligent and autonomous vehicles. Complex scenes always yield great challenges in modeling the patterns of surrounding traffic. For example, one main challenge comes from the intractable interaction effects in a complex traffic system. In this paper, we propose a multi-layer architecture Interaction-aware Kalman Neural Networks (IaKNN) which involves an interaction layer for resolving high-dimensional traffic environmental observations as interaction-aware accelerations, a motion layer for transforming the accelerations to interaction aware trajectories, and a filter layer for estimating future trajectories with a Kalman filter network. Attributed to the multiple traffic data sources, our end-to-end trainable approach technically fuses dynamic and interaction-aware trajectories boosting the prediction performance. Experiments on the NGSIM dataset demonstrate that IaKNN outperforms the state-of-the-art methods in terms of effectiveness for traffic trajectory prediction.

preprint2021arXiv

Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection

The multi-modal salient object detection model based on RGB-D information has better robustness in the real world. However, it remains nontrivial to better adaptively balance effective multi-modal information in the feature fusion phase. In this letter, we propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes, and balance their influence. Our framework is divided into three phases: perception phase, recoding mixing phase and feature integration phase. First, A perception encoder is adopted to extract multi-level single-modal features, which lays the foundation for multi-modal semantic comparative analysis. Then, a modal-adaptive gate unit (MGU) is proposed to suppress the invalid information and transfer the effective modal features to the recoding mixer and the hybrid branch decoder. The recoding mixer is responsible for recoding and mixing the balanced multi-modal information. Finally, the hybrid branch decoder completes the multi-level feature integration under the guidance of an optional edge guidance stream (OEGS). Experiments and analysis on eight popular benchmarks verify that our framework performs favorably against 9 state-of-art methods.

preprint2020arXiv

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

Deep neural networks are being widely deployed for many critical tasks due to their high classification accuracy. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. These malicious behaviors can be triggered at the adversary's will and hence, cause a serious threat to the widespread deployment of deep models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients. Inserting backdoors into a network alters its decision boundaries which are effectively encoded in their adversarial perturbations. We train a two stream network for Trojan detection from its global ($L_\infty$ and $L_2$ bounded) perturbations and the localized region of high energy within each perturbation. The former encodes decision boundaries of the network and latter encodes the unknown trigger shape. We also propose an anomaly detection method to identify the target class in a Trojaned network. Our methods are invariant to the trigger type, trigger size, training data and network architecture. We evaluate our methods on MNIST, NIST-Round0 and NIST-Round1 datasets, with up to 1,000 pre-trained models making this the largest study to date on Trojaned network detection, and achieve over 92\% detection accuracy to set the new state-of-the-art.

preprint2020arXiv

Dependence of the Inverse Spin Hall Effect in Sr(Nb$_x$Ti$_{1-x}$)O$_3$ on the Nb concentration

We measured the spin rectify effect and the inverse spin Hall effect in Nb-doped SrTiO$_3$ by injecting the spins from ferromagnetic thin films to SrTiO$_3$ using spin pumping. It was shown that the spin injection is increased when the doping level is increased. However, the spin Hall angle decreases when Nb occupies more than $\sim$2\% of the Ti sites, which is due to that beyond this concentration, the electron contributed the spin Hall effect is from the $d-$orbitals of Nb instead those from the Ti. Our work points to the importance of orbital occupations in the (inverse) spin Hall effect. We may explore controllable spin and charge interconversion in oxide spintronics.

preprint2020arXiv

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Click-through rate (CTR) prediction is an essential task in web applications such as online advertising and recommender systems, whose features are usually in multi-field form. The key of this task is to model feature interactions among different feature fields. Recently proposed deep learning based models follow a general paradigm: raw sparse input multi-filed features are first mapped into dense field embedding vectors, and then simply concatenated together to feed into deep neural networks (DNN) or other specifically designed networks to learn high-order feature interactions. However, the simple \emph{unstructured combination} of feature fields will inevitably limit the capability to model sophisticated interactions among different fields in a sufficiently flexible and explicit fashion. In this work, we propose to represent the multi-field features in a graph structure intuitively, where each node corresponds to a feature field and different fields can interact through edges. The task of modeling feature interactions can be thus converted to modeling node interactions on the corresponding graph. To this end, we design a novel model Feature Interaction Graph Neural Networks (Fi-GNN). Taking advantage of the strong representative power of graphs, our proposed model can not only model sophisticated feature interactions in a flexible and explicit fashion, but also provide good model explanations for CTR prediction. Experimental results on two real-world datasets show its superiority over the state-of-the-arts.

preprint2020arXiv

Object-oriented SLAM using Quadrics and Symmetry Properties for Indoor Environments

Aiming at the application environment of indoor mobile robots, this paper proposes a sparse object-level SLAM algorithm based on an RGB-D camera. A quadric representation is used as a landmark to compactly model objects, including their position, orientation, and occupied space. The state-of-art quadric-based SLAM algorithm faces the observability problem caused by the limited perspective under the plane trajectory of the mobile robot. To solve the problem, the proposed algorithm fuses both object detection and point cloud data to estimate the quadric parameters. It finishes the quadric initialization based on a single frame of RGB-D data, which significantly reduces the requirements for perspective changes. As objects are often observed locally, the proposed algorithm uses the symmetrical properties of indoor artificial objects to estimate the occluded parts to obtain more accurate quadric parameters. Experiments have shown that compared with the state-of-art algorithm, especially on the forward trajectory of mobile robots, the proposed algorithm significantly improves the accuracy and convergence speed of quadric reconstruction. Finally, we made available an opensource implementation to replicate the experiments.

preprint2019arXiv

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.

preprint2019arXiv

Optimal design of shell-graded-infill structures by a hybrid MMC-MMV approach

In the present work, a hybrid MMC-MMV approach is developed for designing additive manufacturing-oriented shell-graded-infill structures. The key idea is to describe the geometry of a shell-graded-infill structure explicitly using some geometry parameters. To this end, a set of morphable voids is adopted to describe the boundary of the coating shell, while a set of morphable components combing with a coordinate perturbation technique are introduced to represent the graded infill distribution. Under such treatment, both the crisp boundary of the coating shell and the graded infill can be optimized simultaneously, with a small number of design variables. Numerical examples demonstrate the effectiveness of the proposed approach.

preprint2016arXiv

A Self-Paced Regularization Framework for Multi-Label Learning

In this paper, we propose a novel multi-label learning framework, called Multi-Label Self-Paced Learning (MLSPL), in an attempt to incorporate the self-paced learning strategy into multi-label learning regime. In light of the benefits of adopting the easy-to-hard strategy proposed by self-paced learning, the devised MLSPL aims to learn multiple labels jointly by gradually including label learning tasks and instances into model training from the easy to the hard. We first introduce a self-paced function as a regularizer in the multi-label learning formulation, so as to simultaneously rank priorities of the label learning tasks and the instances in each learning iteration. Considering that different multi-label learning scenarios often need different self-paced schemes during optimization, we thus propose a general way to find the desired self-paced functions. Experimental results on three benchmark datasets suggest the state-of-the-art performance of our approach.

preprint2016arXiv

Flexible Pilot Contamination Mitigation with Doppler PSD Alignment

Pilot contamination in the uplink (UL) can severely degrade the channel estimation quality at the base station (BS) in a massive multi-input multi-output (MIMO) system. Thus, it is critical to explore all possible avenues to enable more orthogonal resources for the users to transmit non-interfering UL pilots. In conventional designs, pilot orthogonality typically assumes constant channel gains over time, which limits the amount of orthogonal resources in the case of time-selective channels. To circumvent this constraint, in this paper, we show how to enable orthogonal multiplexing of pilots in the case of Doppler fading by aligning the power spectrum densities (PSD) of different users. From the derived PSD aligning rules, we can see multiple users can be sounded simultaneously without creating/suffering pilot contamination even when these users are experiencing time-varying channels. Furthermore, we provide analytical formulas characterizing the channel estimation mean square error (MSE) performance. Computer simulations further confirm us the PSD alignment can serve as one important decontamination mechanism for the UL pilots in massive MIMO.

preprint2016arXiv

Pilot Decontamination via PDP Alignment

In this paper, we look into the issue of intra-cell uplink (UL) pilot orthogonalization and schemes for mitigating the inter-cell pilot contamination with a realistic massive multi-input multi-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system model. First, we show how to align the power-delay profiles (PDP) of different users served by one BS so that the pilots sent within one common OFDM symbol are orthogonal. From the derived aligning rule, we see much more users can be sounded in the same OFDM symbol as their channels are sparse in time. Second, in the case of massive MIMO, we show how PDP alignment can help to alleviate the pilot contamination due to inter-cell interference. We demonstrate that, by utilizing the fact that different paths in time are associated with different angles of arrival (AoA), the pilot contamination can be significantly reduced through aligning the PDPs of the users served by different BSs appropriately. Computer simulations further convince us PDP aligning can serve as the new baseline design philosophy for the UL pilots in massive MIMO.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2506.01482:author:4:xiaoyu-zhang

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.18610:author:4:xiaoyu-zhang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.03660:author:4:xiaoyu-zhang

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.16927:author:10:xiaoyu-zhang

Imported May 20, 2026Synced May 20, 2026

3 works

Di Wang

Researcher

Di Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Chuanhou Gao

Researcher

Chuanhou Gao contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Dian Jin

Researcher

Dian Jin contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Feng Dong

Researcher

Feng Dong contributes to research discovery and scholarly infrastructure.

Open to collaborate

Xiaoyu Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

30 published item(s)

A Hybrid Framework Combining Autoregression and Common Factors for Matrix Time Series

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

From Static Risk to Dynamic Trajectories: Toward World-Model-Inspired Clinical Prediction

Minimax Optimal Robust Sparse Regression with Heavy-Tailed Designs: A Gradient-Based Approach

Reduced-Rank Autoregressive Model for High-Dimensional Multivariate Network Time Series

Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning

Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

DREAM: Debugging and Repairing AutoML Pipelines

Entropy-driven order in an array of nanomagnets

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing

Nonparametric Quantile Regression for Homogeneity Pursuit in Panel Data Models

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

On Stability of Two Kinds of Delayed Chemical Reaction Networks

Perception-and-Regulation Network for Salient Object Detection

Squeezed metallic droplet with tunable Kubo gap and charge injection in transition metal dichalcogenides

Towards Programming Adaptive Linear Neural Networks Through Chemical Reaction Networks

VAC-CNN: A Visual Analytics System for Comparative Studies of Deep Convolutional Neural Networks

ConceptScope: Organizing and Visualizing Knowledge in Documents based on Domain Ontology

Consensus-Based Decentralized Energy Trading for Distributed Energy Resources

Interaction-aware Kalman Neural Networks for Trajectory Prediction

Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

Dependence of the Inverse Spin Hall Effect in Sr(Nb$_x$Ti$_{1-x}$)O$_3$ on the Nb concentration

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Object-oriented SLAM using Quadrics and Symmetry Properties for Indoor Environments

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Optimal design of shell-graded-infill structures by a hybrid MMC-MMV approach

A Self-Paced Regularization Framework for Multi-Label Learning

Flexible Pilot Contamination Mitigation with Doppler PSD Alignment

Pilot Decontamination via PDP Alignment