Researcher profile

Yifei Shen

Yifei Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Real-world clinical text-to-SQL requires reasoning over heterogeneous EHR tables, temporal windows, and patient-similarity cohorts to produce executable queries. We introduce CLINSQL, a benchmark of 633 expert-annotated tasks on MIMIC-IV v3.1 that demands multi-table joins, clinically meaningful filters, and executable SQL. Solving CLINSQL entails navigating schema metadata and clinical coding systems, handling long contexts, and composing multi-step queries beyond traditional text-to-SQL. We evaluate 22 proprietary and open-source models under Chain-of-Thought self-refinement and use rubric-based SQL analysis with execution checks that prioritize critical clinical requirements. Despite recent advances, performance remains far from clinical reliability: on the test set, GPT-5-mini attains 74.7% execution score, DeepSeek-R1 leads open-source at 69.2% and Gemini-2.5-Pro drops from 85.5% on Easy to 67.2% on Hard. Progress on CLINSQL marks tangible advances toward clinically reliable text-to-SQL for real-world EHR analytics.

preprint2023arXiv

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. With these simple modifications, Graphormer could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on 2D and 3D molecular graph modeling tasks. In addition, we show that with a global receptive field and an adaptive aggregation strategy, Graphormer is more powerful than classic message-passing-based GNNs. Empirically, Graphormer could achieve much less MAE than the originally reported results on the PCQM4M quantum chemistry dataset used in KDD Cup 2021. In the meanwhile, it greatly outperforms the competitors in the recent Open Catalyst Challenge, which is a competition track on NeurIPS 2021 workshop, and aims to model the catalyst-adsorbate reaction system with advanced AI models. All codes could be found at https://github.com/Microsoft/Graphormer.

preprint2022arXiv

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. The "Graphormer-V2" could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on downstream tasks. In addition, we show that with a global receptive field and an adaptive aggregation strategy, Graphormer is more powerful than classic message-passing-based GNNs. Graphormer-V2 achieves much less MAE than the vanilla Graphormer on the PCQM4M quantum chemistry dataset used in KDD Cup 2021, where the latter one won the first place in this competition. In the meanwhile, Graphormer-V2 greatly outperforms the competitors in the recent Open Catalyst Challenge, which is a competition track on NeurIPS 2021 workshop, and aims to model the catalyst-adsorbate reaction system with advanced AI models. All models could be found at \url{https://github.com/Microsoft/Graphormer}.

preprint2022arXiv

Belief-selective Propagation Detection for MIMO Systems

Compared to the linear MIMO detectors, the Belief Propagation (BP) detector has shown greater capabilities in achieving near optimal performance and better nature to iteratively cooperate with channel decoders. Aiming at real applications, recent works mainly fall into the category of reducing the complexity by simplified calculations, at the expense of performance sacrifice. However, the complexity is still unsatisfactory with exponentially increasing complexity or required exponentiation operations. Furthermore, due to the inherent loopy structure, the existing BP detectors persistently encounter error floor in high signal-to-noise ratio (SNR) region, which becomes even worse with calculation approximation. This work aims at a revised BP detector, named {Belief-selective Propagation (BsP)} detector by selectively utilizing the \emph{trusted} incoming messages with sufficiently large \textit{a priori} probabilities for updates. Two proposed strategies: symbol-based truncation (ST) and edge-based simplification (ES) squeeze the complexity (orders lower than the Original-BP), while greatly relieving the error floor issue over a wide range of antenna and modulation combinations. For the $16$-QAM $8 \times 4$ MIMO system, the $\mathcal{B}(1,1)$ {BsP} detector achieves more than $4$\,dB performance gain (@$\text{BER}=10^{-4}$) with roughly $4$ orders lower complexity than the Original-BP detector. Trade-off between performance and complexity towards different application requirement can be conveniently obtained by configuring the ST and ES parameters.

preprint2022arXiv

How Neural Architectures Affect Deep Learning for Communication Networks?

In recent years, there has been a surge in applying deep learning to various challenging design problems in communication networks. The early attempts adopt neural architectures inherited from applications such as computer vision, which suffer from poor generalization, scalability, and lack of interpretability. To tackle these issues, domain knowledge has been integrated into the neural architecture design, which achieves near-optimal performance in large-scale networks and generalizes well under different system settings. This paper endeavors to theoretically validate the importance and effects of neural architectures when applying deep learning to design communication networks. We prove that by exploiting permutation invariance, a common property in communication networks, graph neural networks (GNNs) converge faster and generalize better than fully connected multi-layer perceptrons (MLPs), especially when the number of nodes (e.g., users, base stations, or antennas) is large. Specifically, we prove that under common assumptions, for a communication network with $n$ nodes, GNNs converge $O(n \log n)$ times faster and their generalization error is $O(n)$ times lower, compared with MLPs.

preprint2022arXiv

Invariant Information Bottleneck for Domain Generalization

Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed invariant information bottleneck (IIB). IIB aims at minimizing invariant risks for nonlinear classifiers and simultaneously mitigating the impact of pseudo-invariant features and geometric skews. Specifically, we first present a novel formulation for invariant causal prediction via mutual information. Then we adopt the variational formulation of the mutual information to develop a tractable loss function for nonlinear classifiers. To overcome the failure modes of IRM, we propose to minimize the mutual information between the inputs and the corresponding representations. IIB significantly outperforms IRM on synthetic datasets, where the pseudo-invariant features and geometric skews occur, showing the effectiveness of proposed formulation in overcoming failure modes of IRM. Furthermore, experiments on DomainBed show that IIB outperforms $13$ baselines by $0.9\%$ on average across $7$ real datasets.

preprint2022arXiv

Neural Piecewise-Constant Delay Differential Equations

Continuous-depth neural networks, such as the Neural Ordinary Differential Equations (ODEs), have aroused a great deal of interest from the communities of machine learning and data science in recent years, which bridge the connection between deep neural networks and dynamical systems. In this article, we introduce a new sort of continuous-depth neural network, called the Neural Piecewise-Constant Delay Differential Equations (PCDDEs). Here, unlike the recently proposed framework of the Neural Delay Differential Equations (DDEs), we transform the single delay into the piecewise-constant delay(s). The Neural PCDDEs with such a transformation, on one hand, inherit the strength of universal approximating capability in Neural DDEs. On the other hand, the Neural PCDDEs, leveraging the contributions of the information from the multiple previous time steps, further promote the modeling capability without augmenting the network dimension. With such a promotion, we show that the Neural PCDDEs do outperform the several existing continuous-depth neural frameworks on the one-dimensional piecewise-constant delay population dynamics and real-world datasets, including MNIST, CIFAR10, and SVHN.

preprint2022arXiv

Spatiotemporal 2-D Channel Coding for Very Low Latency Reliable MIMO Transmission

To fully support vertical industries, 5G and its corresponding channel coding are expected to meet requirements of different applications. However, for applications of 5G and beyond 5G (B5G) such as URLLC, the transmission latency is required to be much shorter than that in eMBB. Therefore, the resulting channel code length reduces drastically. In this case, the traditional 1-D channel coding suffers a lot from the performance degradation and fails to deliver strong reliability with very low latency. To remove this bottleneck, new channel coding scheme beyond the existing 1-D one is in urgent need. By making full use of the spacial freedom of massive MIMO systems, this paper devotes itself in proposing a spatiotemporal 2-D channel coding for very low latency reliable transmission. For a very short time-domain code length $N^{\text{time}}=16$, $64 \times 128$ MIMO system employing the proposed spatiotemporal 2-D coding scheme successfully shows more than $3$\,dB performance gain at $\text{FER}=10^{-3}$, compared to the 1-D time-domain channel coding. It is noted that the proposed coding scheme is suitable for different channel codes and enjoys high flexibility to adapt to difference scenarios. By appropriately selecting the code rate, code length, and the number of codewords in the time and space domains, the proposed coding scheme can achieve a good trade-off between the transmission latency and reliability.

preprint2020arXiv

Complete Dictionary Learning via $\ell_p$-norm Maximization

Dictionary learning is a classic representation learning method that has been widely applied in signal processing and data analytics. In this paper, we investigate a family of $\ell_p$-norm ($p>2,p \in \mathbb{N}$) maximization approaches for the complete dictionary learning problem from theoretical and algorithmic aspects. Specifically, we prove that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Based on the generalized power method (GPM), an efficient algorithm is then developed for the $\ell_p$-based formulations. We further show the efficacy of the developed algorithm: for the population GPM algorithm over the sphere constraint, it first quickly enters the neighborhood of a global maximizer, and then converges linearly in this region. Extensive experiments will demonstrate that the $\ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches and $p=3$ performs the best.