Researcher profile

Jun Yao

Jun Yao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the core competencies of scientific intelligence, including Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding, Scientific Symbolic Reasoning, Scientific Code Generation, Science Hypothesis Generation and Scientific Knowledge Understanding. It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science. SciEvalKit builds a foundation of expert-grade scientific benchmarks, curated from real-world, domain-specific datasets, ensuring that tasks reflect authentic scientific challenges. The toolkit features a flexible, extensible evaluation pipeline that enables batch evaluation across models and datasets, supports custom model and dataset integration, and provides transparent, reproducible, and comparable results. By bridging capability-based evaluation and disciplinary diversity, SciEvalKit offers a standardized yet customizable infrastructure to benchmark the next generation of scientific foundation models and intelligent agents. The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.

preprint2023arXiv

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$π$. Experiments are then conducted using the same dataset and training strategy to compare PanGu-$π$ with state-of-the-art LLMs. The results show that PanGu-$π$-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-$π$-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-$π$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.

preprint2022arXiv

Generation of optical vortices imitating water vortices

In optics, we can generate vortex beams using specific methods such as spiral phase plates or computer generated holograms. While, in nature, it is worth noting that water can produce vortices by a circularly symmetrical hole. So, if a light beam can generate vortex when it is diffracted by an aperture? Here, we show that the light field in the Fresnel region of the diffracted circularly polarized beam carries orbital angular momentum, which can transfer to the trapped particles and make orbital rotation.

preprint2021arXiv

Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for Portfolio Optimization and Order Execution

Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the price slippage as part of the trading cost. To address these issues, we propose a hierarchical reinforced stock trading system for portfolio management (HRPM). Concretely, we decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies. The high-level policy gives portfolio weights at a lower frequency to maximize the long term profit and invokes the low-level policy to sell or buy the corresponding shares within a short time window at a higher frequency to minimize the trading cost. We train two levels of policies via pre-training scheme and iterative training scheme for data efficiency. Extensive experimental results in the U.S. market and the China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches.

preprint2021arXiv

Sub-Architecture Ensemble Pruning in Neural Architecture Search

Neural architecture search (NAS) is gaining more and more attention in recent years due to its flexibility and remarkable capability to reduce the burden of neural network design. To achieve better performance, however, the searching process usually costs massive computations that might not be affordable for researchers and practitioners. While recent attempts have employed ensemble learning methods to mitigate the enormous computational cost, however, they neglect a key property of ensemble methods, namely diversity, which leads to collecting more similar sub-architectures with potential redundancy in the final design. To tackle this problem, we propose a pruning method for NAS ensembles called "Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP)." It targets to leverage diversity and to achieve sub-ensemble architectures at a smaller size with comparable performance to ensemble architectures that are not pruned. Three possible solutions are proposed to decide which sub-architectures to prune during the searching process. Experimental results exhibit the effectiveness of the proposed method by largely reducing the number of sub-architectures without degrading the performance.

preprint2020arXiv

A fully discrete energy stable scheme for a phase-field moving contact line model with variable densities and viscosities

In this work, we propose a fully discrete energy stable scheme for the phase-field moving contact line model with variable densities and viscosities. The mathematical model consists of a Cahn-Hilliard equation, a Navier-Stokes equation and the generalized Navier boundary condition for the moving contact line. A scalar auxiliary variable is adopted to transform the governing system into an equivalent form, allowing the double well potential to be treated semi-explicitly. A stabilization term is added to balance the explicit nonlinear term originating from the surface energy at fluid-solid interface. A pressure stabilization method is used to decouple the computation of velocity and pressure. Some subtle implicit-explicit treatments are adopted to deal with convention and stress terms. We establish a rigorous proof of energy stability for the proposed time-marching scheme. Then a finite difference method on staggered grids is used to spatially discretize the constructed time-marching scheme. We further prove that the fully discrete scheme also satisfies the discrete energy dissipation law. Numerical results demonstrate accuracy and energy stability of the proposed scheme. Using our numerical scheme, we analyze the contact line dynamics through a shear flow driven droplet sliding case. Three-dimensional droplet spreading is also investigated on a chemically patterned surface. Our numerical simulation accurately predicts the expected energy evolutions and it successfully reproduces expected phenomena that an oil droplet contracts inwards on a hydrophobic zone and spreads outwards quickly on a hydrophilic zone.

preprint2020arXiv

Correction of Faulty Background Knowledge based on Condition Aware and Revise Transformer for Question Answering

The study of question answering has received increasing attention in recent years. This work focuses on providing an answer that compatible with both user intent and conditioning information corresponding to the question, such as delivery status and stock information in e-commerce. However, these conditions may be wrong or incomplete in real-world applications. Although existing question answering systems have considered the external information, such as categorical attributes and triples in knowledge base, they all assume that the external information is correct and complete. To alleviate the effect of defective condition values, this paper proposes condition aware and revise Transformer (CAR-Transformer). CAR-Transformer (1) revises each condition value based on the whole conversation and original conditions values, and (2) it encodes the revised conditions and utilizes the conditions embedding to select an answer. Experimental results on a real-world customer service dataset demonstrate that the CAR-Transformer can still select an appropriate reply when conditions corresponding to the question exist wrong or missing values, and substantially outperforms baseline models on automatic and human evaluations. The proposed CAR-Transformer can be extended to other NLP tasks which need to consider conditioning information.

preprint2020arXiv

Highly transparent contacts to the 1D hole gas in ultra-scaled Ge/Si core/shell nanowires

Semiconductor-superconductor hybrid systems have outstanding potential for emerging high-performance nanoelectronics and quantum devices. However, critical to their successful application is the fabrication of high-quality and reproducible semiconductor-superconductor interfaces. Here, we realize and measure axial Al-Ge-Al nanowire heterostructures with atomically precise interfaces, enwrapped by an ultrathin epitaxial Si layer further denoted as Al-Ge/Si-Al nanowire heterostructures. The heterostructures were synthesized by a thermally induced exchange reaction of single-crystalline Ge/Si core/shell nanowires and lithographically defined Al contact pads. Applying this heterostructure formation scheme enables self-aligned quasi one-dimensional crystalline Al leads contacting ultrascaled Ge/Si segments with contact transparencies greater than 96%. Integration into back-gated field-effect devices and continuous scaling beyond lithographic limitations allows us to exploit the full potential of the highly transparent contacts to the 1D hole gas at the Ge-Si interface. This leads to the observation of ballistic transport as well as quantum confinement effects up to temperatures of 150 K. Low-temperature measurements reveal proximity-induced superconductivity in the Ge/Si core/shell nanowires. The realization of a Josephson field-effect transistor allows us to study the subgap structure caused by multiple Andreev reflections. Most importantly, the absence of a quantum dot regime indicates a hard superconducting gap originating from the highly transparent contacts to the 1D hole gas, which is potentially interesting for the study of Majorana zero modes. Moreover, underlining the importance of the proposed thermally induced Al-Ge/Si-Al heterostructure formation technique, our system could contribute to the development of key components of quantum computing such as gatemon or transmon qubits

preprint2020arXiv

Spin filtering in germanium/silicon core/shell nanowires with pseudo-helical gap

Semiconductors with strong spin-orbit interactions can exhibit a helical gap with spin-momentum locking opened by a magnetic field. Such a gap is highly spin selective as a result of a topologically protected spin-momentum locking, which can be used for spin filtering. We experimentally demonstrate such a spin filtering effect in a quasi-ballistic p-type germanium/silicon core/shell nanowire (NW), which possesses a pseudo-helical gap without the application of magnetic field. Polarized hole spin injection to the NW is achieved using cobalt ferromagnetic contacts with controlled natural surface oxide on the NW as a tunnel barrier. Local and nonlocal spin valve effects are measured as the verification of polarized spin transport in the NW outside the helical gap. We electrically tune the NW into the helical gap by scanning its chemical potential with a gate. A hysteresis loop with three resistance states is observed in the local spin valve geometry, as an evidence of spin filtering in the helical gap.

preprint2019arXiv

Numerical approximation of a phase-field surfactant model with fluid flow

Modelling interfacial dynamics with soluble surfactants in a multiphase system is a challenging task. Here, we consider the numerical approximation of a phase-field surfactant model with fluid flow. The nonlinearly coupled model consists of two Cahn-Hilliard-type equations and incompressible Navier-Stokes equation. With the introduction of two auxiliary variables, the governing system is transformed into an equivalent form, which allows the nonlinear potentials to be treated efficiently and semi-explicitly. By certain subtle explicit-implicit treatments to stress and convective terms, we construct first and second-order time marching schemes, which are extremely efficient and easy-to-implement, for the transformed governing system. At each time step, the schemes involve solving only a sequence of linear elliptic equations, and computations of phase-field variables, velocity and pressure are fully decoupled. We further establish a rigorous proof of unconditional energy stability for the first-order scheme. Numerical results in both two and three dimensions are obtained, which demonstrate that the proposed schemes are accurate, efficient and unconditionally energy stable. Using our schemes, we investigate the effect of surfactants on droplet deformation and collision under a shear flow, where the increase of surfactant concentration can enhance droplet deformation and inhibit droplet coalescence.