Researcher profile

Yang Shen

Yang Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Charge disproportionation as a possible mechanism towards polar antiferromagnetic metal in molecular orbital crystal

Polar antiferromagnetic metals have recently garnered increasing interests due to their combined traits of both ferromagnets and antiferromagnets for spintronic applications. However, the inherently incompatible nature of antiferromagnet, metallicity and polarity pose a significant challenge. We propose that charge disproportionation can lead to this novel state in negative charge transfer gap regime in molecular orbital crystal by molecular orbital analyses of first-principles DFT+$U$ electronic band structure for representative Ruddlesden-Popper bilayer perovskite oxides Sr$_3$Co$_2$O$_7$, corroborated by Density Matrix Renormalization Group calculation. Due to the negative charge transfer nature of Co$^{4+}$ and imposed by strong interlayer coupling, localized molecular orbitals stemming from the hybridization of Co $d_{z^2}$ and $d_{xz/yz}$ orbitals through the apical oxygen $p$ orbitals are preferably emergent within each bilayer unit, which develop antiferromagnetic ordering by invoking Hubbard repulsion. Charge disproportionation driven by Hund's physics, makes an occupation imbalance with broken inversion symmetry in the remaining $d_{xy}$ and $d_{x^2-y^2}$ orbitals from distinct Co atoms within the bilayer unit, resulting in the polar metallicity. Meanwhile, this charge disproportionation scenario allows consequent conducting carriers to couple with interlayer local spins via Hund's coupling, giving rise to in-plane double-exchange ferromagnetism. Our molecular orbital formulation further provides a guide towards an effective Hamiltonian for modelling the unconventional synergy of metallicity, polarity and antiferromagnetism in Sr$_3$Co$_2$O$_7$, which may be a unified framework widely applicable to double-layer Ruddlesden-Popper perovskite oxides.

preprint2026arXiv

Learning to Perceive "Where": Spatial Pretext Tasks for Robust Self-Supervised Learning

Existing self-supervised learning (SSL) methods primarily learn object-invariant representations but often neglect the spatial structure and relationships among object parts. To address this limitation, we introduce Spatial Prediction (SP), a spatially aware pretext regression task that predicts the relative position and scale between a pair of disentangled local views from the same image. By modeling part-to-part relationships in a continuous geometric space, SP encourages representations to capture fine-grained spatial dependencies beyond invariant categorical semantics, thereby learning the compositional structure of visual scenes. SP is implemented as a decoupled plug-in and can be seamlessly integrated into diverse SSL frameworks. Extensive experiments show consistent improvements across image recognition, fine-grained classification, semantic segmentation, and depth estimation, as well as substantial gains in out-of-distribution robustness for object recognition. To evaluate spatial reasoning, we introduce (1) a position and scale prediction task on image patch pairs and (2) a jigsaw understanding task requiring patch reordering and recognition after reconstruction. Strong performance on these tasks indicates improved spatial structure and geometric awareness. Overall, explicitly modeling spatial information provides an effective inductive bias for SSL, leading to more structured representations and better generalization. Code and models will be released.

preprint2026arXiv

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

preprint2025arXiv

Interaction-Driven Chern Insulator at Zero Electric Field in ABCB-Stacked Tetralayer Graphene

ABCB-stacked tetralayer graphene, with intrinsic spontaneous polarization, offers a unique platform to explore electron correlation effects, whose interplay with spin-orbit coupling may engender topological phases. Here, employing a $\mathbf{k}\cdot\mathbf{p}$ model with self-consistent Hartree-Fock calculations, we investigate its electronic ground states. Remarkably, we find that the intrinsic polarization, in conjunction with strong interactions ($U=8 \text{ eV}$) and SOC, is sufficient to drive a $C=3$ quantum anomalous Hall state, obviating the need for an external electric field typical in ABCA stacks. Conversely, at moderate interactions ($U=6 \text{ eV}$), a minimal electric field is necessary. Furthermore, calculations predict other correlation-driven metallic phases such as quarter- and three-quarter-filled states. These results establish that the synergy of intrinsic polarization, correlations, and SOC governs the rich topological phenomena, suggesting ABCB-stacked graphene as a highly tunable platform for exploring emergent topological phenomena.

preprint2024arXiv

From Function to Distribution Modeling: A PAC-Generative Approach to Offline Optimization

This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of ``offline" data examples. While recent years have seen a flurry of work on applying various machine learning techniques to the offline optimization problem, the majority of these work focused on learning a surrogate of the unknown objective function and then applying existing optimization algorithms. While the idea of modeling the unknown objective function is intuitive and appealing, from the learning point of view it also makes it very difficult to tune the objective of the learner according to the objective of optimization. Instead of learning and then optimizing the unknown objective function, in this paper we take on a less intuitive but more direct view that optimization can be thought of as a process of sampling from a generative model. To learn an effective generative model from the offline data examples, we consider the standard technique of ``re-weighting", and our main technical contribution is a probably approximately correct (PAC) lower bound on the natural optimization objective, which allows us to jointly learn a weight function and a score-based generative model. The robustly competitive performance of the proposed approach is demonstrated via empirical studies using the standard offline optimization benchmarks.

preprint2022arXiv

Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations

Self-supervision is recently surging at its new frontier of graph learning. It facilitates graph representations beneficial to downstream tasks; but its success could hinge on domain knowledge for handcraft or the often expensive trials and errors. Even its state-of-the-art representative, graph contrastive learning (GraphCL), is not completely free of those needs as GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations. Our work aims at advancing GraphCL by answering the following questions: How to represent the space of graph augmented views? What principle can be relied upon to learn a prior in that space? And what framework can be constructed to learn the prior in tandem with contrastive learning? Accordingly, we have extended the prefabricated discrete prior in the augmentation set, to a learnable continuous prior in the parameter space of graph generators, assuming that graph priors per se, similar to the concept of image manifolds, can be learned by data generation. Furthermore, to form contrastive views without collapsing to trivial solutions due to the prior learnability, we have leveraged both principles of information minimization (InfoMin) and information bottleneck (InfoBN) to regularize the learned priors. Eventually, contrastive learning, InfoMin, and InfoBN are incorporated organically into one framework of bi-level optimization. Our principled and automated approach has proven to be competitive against the state-of-the-art graph self-supervision methods, including GraphCL, on benchmarks of small graphs; and shown even better generalizability on large-scale graphs, without resorting to human expertise or downstream validation. Our code is publicly released at https://github.com/Shen-Lab/GraphCL_Automated.

preprint2022arXiv

Cone-constrained Monotone Mean-Variance Portfolio Selection Under Diffusion Models

We consider monotone mean-variance (MMV) portfolio selection problems with a conic convex constraint under diffusion models, and their counterpart problems under mean-variance (MV) preferences. We obtain the precommitted optimal strategies to both problems in closed form and find that they coincide, without and with the presence of the conic constraint. This result generalizes the equivalence between MMV and MV preferences from non-constrained cases to a specific constrained case. A comparison analysis reveals that the orthogonality property under the conic convex set is a key to ensuring the equivalence result.

preprint2022arXiv

Microscopic study of optically-stable, coherent color centers in diamond generated by high-temperature annealing

Single color centers in solid have emerged as promising physical platforms for quantum information science. Creating these centers with excellent quantum properties is a key foundation for further technological developments. In particular, the microscopic understanding of the spin bath environments is the key to engineer color centers for quantum control. In this work, we propose and demonstrate a distinct high-temperature annealing (HTA) approach for creating high-quality nitrogen vacancy (NV) centers in implantation-free diamonds. Simultaneously using the created NV centers as probes for their local environment we verify that no damage was microscopically induced by the HTA. Nearly all single NV centers created in ultra-low-nitrogen-concentration membranes possess stable and Fourier-transform-limited optical spectra. Furthermore, HTA strongly reduces noise sources naturally grown in ensemble samples, and leads to more than three-fold improvements of decoherence time and sensitivity. We also verify that the vacancy activation and defect reformation, especially H3 and P1 centers, can explain the reconfiguration between spin baths and color centers. This novel approach will become a powerful tool in vacancy-based quantum technology.

preprint2022arXiv

Spectral Thermal Spreading Resistance of Wide Bandgap Semiconductors in Ballistic-Diffusive Regime

To develop efficient thermal management strategies for wide bandgap (WBG) semiconductor devices, it is essential to have a clear understanding of the heat transport process within the device and accurately predict the junction temperature. In this paper, we used the phonon Monte Carlo (MC) method with the phonon dispersion of various typical WBG semiconductors, including GaN, SiC, AlN, and \ce{β-Ga_2O_3}, to investigate the thermal spreading resistance in a ballistic-diffusive regime. It was found that when compared with Fourier's law-based predictions, the increase in the thermal resistance caused by ballistic effects was strongly related to different phonon dispersions. Based on the model deduced under the gray-medium approximation and the results of dispersion MC, we obtained a thermal resistance model that can well address the issues of thermal spreading and ballistic effects, and the influences of phonon dispersion. The model can be easily coupled with FEM based thermal analysis and applied to different materials. This paper can provide a clearer understanding of the influences of phonon dispersion on the thermal transport process, and it can be useful for the prediction of junction temperatures and the development of thermal management strategies for WBG semiconductor devices.

preprint2022arXiv

Thermal spreading resistance of GaN HEMTs with heat source heating studied by hybrid Monte Carlo-diffusion simulations

Exact assessment of thermal spreading resistance is of great importance to the thermal management of electronic devices, especially when completely considering the heat conduction process from the nanoscale heat source to the macroscopic scale heat sink. The existing simulation methods are either based on convectional Fourier's law or limited to small system sizes, making it difficult to accurately and efficiently study the cross-scale heat transfer. In this paper, a hybrid phonon Monte Carlo-diffusion method that couples phonon Monte Carlo (MC) method with Fourier's law by dividing the computational domain is adopted to analyze thermal spreading resistance in ballistic-diffusive regime. Compared with phonon MC simulation, the junction temperature of the hybrid method has the same precision, while the time costs could be reduced up to 2 orders of magnitude at most. Furthermore, the simulation results indicate that the heating scheme has a remarkable impact on phonon transport. The thermal resistance of the heat source (HS) scheme can be larger than that of the heat flux (HF) scheme, which is opposite from the prediction of Fourier's law. In the HS scheme, the enhanced phonon-boundary scattering counteracts the broadening of the heat source, leading to a stronger ballistic effect as the heat source thickness decreases. The conclusion is verified by a one-dimensional thermal resistance model. This work has opened up an opportunity for the fast and extensive thermal modeling of cross-scale heat transfer in electronic devices and highlighted the influence of heating schemes.

preprint2021arXiv

Mean-Variance Investment and Risk Control Strategies -- A Time-Consistent Approach via A Forward Auxiliary Process

We consider an optimal investment and risk control problem for an insurer under the mean-variance (MV) criterion. By introducing a deterministic auxiliary process defined forward in time, we formulate an alternative time-consistent problem related to the original MV problem, and obtain the optimal strategy and the value function to the new problem in closed-form. We compare our formulation and optimal strategy to those under the precommitment and game-theoretic framework. Numerical studies show that, when the financial market is negatively correlated with the risk process, optimal investment may involve short selling the risky asset and, if that happens, a less risk averse insurer short sells more risky asset.

preprint2020arXiv

L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. They need to compute node representations recursively from their neighbors. Current GCN training algorithms suffer from either high computational costs that grow exponentially with the number of layers, or high memory usage for loading the entire graph and node embeddings. In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. We present theoretical analysis for L-GCN under the graph isomorphism framework, that L-GCN leads to as powerful GCNs as the more costly conventional training algorithm does, under mild conditions. We further propose L$^2$-GCN, which learns a controller for each layer that can automatically adjust the training epochs per layer in L-GCN. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size, while maintaining comparable prediction performance. With the learned controller, L$^2$-GCN can further cut the training time in half. Our codes are available at https://github.com/Shen-Lab/L2-GCN.

preprint2020arXiv

Network-principled deep generative models for designing drug combinations as graph sets

Combination therapy has shown to improve therapeutic efficacy while reducing side effects. Importantly, it has become an indispensable strategy to overcome resistance in antibiotics, anti-microbials, and anti-cancer drugs. Facing enormous chemical space and unclear design principles for small-molecule combinations, the computational drug-combination design has not seen generative models to meet its potential to accelerate resistance-overcoming drug combination discovery. We have developed the first deep generative model for drug combination design, by jointly embedding graph-structured domain knowledge and iteratively training a reinforcement learning-based chemical graph-set designer. First, we have developed Hierarchical Variational Graph Auto-Encoders (HVGAE) trained end-to-end to jointly embed gene-gene, gene-disease, and disease-disease networks. Novel attentional pooling is introduced here for learning disease-representations from associated genes' representations. Second, targeting diseases in learned representations, we have recast the drug-combination design problem as graph-set generation and developed a deep learning-based model with novel rewards. Specifically, besides chemical validity rewards, we have introduced a novel generative adversarial award, being generalized sliced Wasserstein, for chemically diverse molecules with distributions similar to known drugs. We have also designed a network principle-based reward for drug combinations. Numerical results indicate that, compared to graph embedding methods, HVGAE learns more informative and generalizable disease representations. Case studies on four diseases show that network-principled drug combinations tend to have low toxicity. The generated drug combinations collectively cover the disease module similar to FDA-approved drug combinations and could potentially suggest novel systems-pharmacology strategies.

preprint2020arXiv

When Does Self-Supervision Help Graph Convolutional Networks?

Self-supervision as an emerging technique has been employed to train convolutional neural networks (CNNs) for more transferrable, generalizable, and robust representation learning of images. Its introduction to graph convolutional networks (GCNs) operating on graph data is however rarely explored. In this study, we report the first systematic exploration and assessment of incorporating self-supervision into GCNs. We first elaborate three mechanisms to incorporate self-supervision into GCNs, analyze the limitations of pretraining & finetuning and self-training, and proceed to focus on multi-task learning. Moreover, we propose to investigate three novel self-supervised learning tasks for GCNs with theoretical rationales and numerical comparisons. Lastly, we further integrate multi-task self-supervision into graph adversarial training. Our results show that, with properly designed task forms and incorporation mechanisms, self-supervision benefits GCNs in gaining more generalizability and robustness. Our codes are available at https://github.com/Shen-Lab/SS-GCNs.

preprint2019arXiv

Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts

Predicting compound-protein affinity is critical for accelerating drug discovery. Recent progress made by machine learning focuses on accuracy but leaves much to be desired for interpretability. Through molecular contacts underlying affinities, our large-scale interpretability assessment finds commonly-used attention mechanisms inadequate. We thus formulate a hierarchical multi-objective learning problem whose predicted contacts form the basis for predicted affinities. We further design a physics-inspired deep relational network, DeepRelations, with intrinsically explainable architecture. Specifically, various atomic-level contacts or "relations" lead to molecular-level affinity prediction. And the embedded attentions are regularized with predicted structural contexts and supervised with partially available training contacts. DeepRelations shows superior interpretability to the state-of-the-art: without compromising affinity prediction, it boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets, respectively. Our study represents the first dedicated model development and systematic model assessment for interpretable machine learning of compound-protein affinity.