Researcher profile

Yang Wei

Yang Wei contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

PDE-to-solver code generation aims to automatically synthesize executable numerical solvers from partial differential equation (PDE) specifications. This task requires not only understanding the mathematical structure of PDEs, but also selecting appropriate discretization schemes and solver configurations, and correctly implementing the resulting formulations in finite-element method (FEM) libraries. Existing code generation benchmarks mainly evaluate syntactic correctness, or success on predefined test cases. To our knowledge, there is currently no publicly available benchmark specifically for PDE-to-solver code generation, and general-purpose code benchmarks do not fully capture the unique challenges of numerical PDE solution, such as ensuring solver accuracy, efficiency, and compatibility with professional FEM libraries. We introduce PDEAgent-Bench, to the best of our knowledge, the first multi-metric, multi-library benchmark for PDE-to-solver code generation. PDEAgent-Bench contains 645 instances across 6 mathematical categories and 11 PDE families, with common FEM libraries for DOLFINx, Firedrake, and deal.II. Each instance provides an agent-facing problem specification, a reference solution on a prescribed evaluation grid, and case-specific accuracy and runtime targets. PDEAgent-Bench adopts a staged evaluation framework in which generated solvers must sequentially pass executability, numerical accuracy, and computational efficiency checks. Experiments with representative LLMs and code agents show that models can often produce runnable code, but their pass rate drops substantially once accuracy and efficiency requirements are enforced. These results indicate that current agents remain limited in producing numerically reliable and efficient PDE solvers, and that PDEAgent-Bench provides a reproducible testbed grounded in the practical requirements of numerical PDE solving.

preprint2022arXiv

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and Transformer's computation patterns are more complex than convolutional neural networks. Existing systems either only focus on model inference or optimization for only BERT-like encoder models. In this paper, we present LightSeq2, a system to accelerate training for a general family of Transformer models on GPUs. We propose a series of GPU optimization techniques tailored to the specific computation flow and memory access patterns of Transformer models. LightSeq2 supports many model architectures, including BERT (encoder-only), GPT (decoder-only), Transformer (encoder-decoder), and vision Transformer. Our experiments for a variety of models and benchmarks show that LightSeq2 is consistently faster (1.4-3.5x) than previous systems on different GPUs. In particular, it gains 308% training speedup compared with existing systems on a large public machine translation benchmark (WMT14 English-German).

preprint2022arXiv

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using any human data. Although such agents can be obtained through self-play training, they can suffer significantly from distributional shift when paired with unencountered partners, such as humans. To mitigate this distributional shift, we propose Maximum Entropy Population-based training (MEP). In MEP, agents in the population are trained with our derived Population Entropy bonus to promote both pairwise diversity between agents and individual diversity of agents themselves, and a common best agent is trained by paring with agents in this diversified population via prioritized sampling. The prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in the Overcooked game environment, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

preprint2021arXiv

In-Order Chart-Based Constituent Parsing

We propose a novel in-order chart-based model for constituent parsing. Compared with previous CKY-style and top-down models, our model gains advantages from in-order traversal of a tree (rich features, lookahead information and high efficiency) and makes a better use of structural knowledge by encoding the history of decisions. Experiments on the Penn Treebank show that our model outperforms previous chart-based models and achieves competitive performance compared with other discriminative single models.

preprint2020arXiv

A Span-based Linearization for Constituent Trees

We propose a novel linearization of a constituent tree, together with a new locally normalized model. For each split point in a sentence, our model computes the normalizer on all spans ending with that split point, and then predicts a tree span from them. Compared with global models, our model is fast and parallelizable. Different from previous local models, our linearization method is tied on the spans directly and considers more local features when performing span prediction, which is more interpretable and effective. Experiments on PTB (95.8 F1) and CTB (92.4 F1) show that our model significantly outperforms existing local models and efficiently achieves competitive results with global models.

preprint2020arXiv

SRQA: Synthetic Reader for Factoid Question Answering

The question answering system can answer questions from various fields and forms with deep neural networks, but it still lacks effective ways when facing multiple evidences. We introduce a new model called SRQA, which means Synthetic Reader for Factoid Question Answering. This model enhances the question answering system in the multi-document scenario from three aspects: model structure, optimization goal, and training method, corresponding to Multilayer Attention (MA), Cross Evidence (CE), and Adversarial Training (AT) respectively. First, we propose a multilayer attention network to obtain a better representation of the evidences. The multilayer attention mechanism conducts interaction between the question and the passage within each layer, making the token representation of evidences in each layer takes the requirement of the question into account. Second, we design a cross evidence strategy to choose the answer span within more evidences. We improve the optimization goal, considering all the answers' locations in multiple evidences as training targets, which leads the model to reason among multiple evidences. Third, adversarial training is employed to high-level variables besides the word embedding in our model. A new normalization method is also proposed for adversarial perturbations so that we can jointly add perturbations to several target variables. As an effective regularization method, adversarial training enhances the model's ability to process noisy data. Combining these three strategies, we enhance the contextual representation and locating ability of our model, which could synthetically extract the answer span from several evidences. We perform SRQA on the WebQA dataset, and experiments show that our model outperforms the state-of-the-art models (the best fuzzy score of our model is up to 78.56%, with an improvement of about 2%).