Source author record

Jun Yao

Jun Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall Artificial Intelligence Computation and Language Machine Learning physics.optics cond-mat.mtrl-sci physics.comp-ph Computer Vision Distributed, Parallel, and Cluster Computing math.NA Neural and Evolutionary Computing physics.app-ph physics.flu-dyn

Catalog footprint

What is connected

17works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the core competencies of scientific intelligence, including Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding, Scientific Symbolic Reasoning, Scientific Code Generation, Science Hypothesis Generation and Scientific Knowledge Understanding. It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science. SciEvalKit builds a foundation of expert-grade scientific benchmarks, curated from real-world, domain-specific datasets, ensuring that tasks reflect authentic scientific challenges. The toolkit features a flexible, extensible evaluation pipeline that enables batch evaluation across models and datasets, supports custom model and dataset integration, and provides transparent, reproducible, and comparable results. By bridging capability-based evaluation and disciplinary diversity, SciEvalKit offers a standardized yet customizable infrastructure to benchmark the next generation of scientific foundation models and intelligent agents. The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.

preprint2023arXiv

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$π$. Experiments are then conducted using the same dataset and training strategy to compare PanGu-$π$ with state-of-the-art LLMs. The results show that PanGu-$π$-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-$π$-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-$π$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.

preprint2022arXiv

Generation of optical vortices imitating water vortices

In optics, we can generate vortex beams using specific methods such as spiral phase plates or computer generated holograms. While, in nature, it is worth noting that water can produce vortices by a circularly symmetrical hole. So, if a light beam can generate vortex when it is diffracted by an aperture? Here, we show that the light field in the Fresnel region of the diffracted circularly polarized beam carries orbital angular momentum, which can transfer to the trapped particles and make orbital rotation.

preprint2021arXiv

Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for Portfolio Optimization and Order Execution

Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the price slippage as part of the trading cost. To address these issues, we propose a hierarchical reinforced stock trading system for portfolio management (HRPM). Concretely, we decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies. The high-level policy gives portfolio weights at a lower frequency to maximize the long term profit and invokes the low-level policy to sell or buy the corresponding shares within a short time window at a higher frequency to minimize the trading cost. We train two levels of policies via pre-training scheme and iterative training scheme for data efficiency. Extensive experimental results in the U.S. market and the China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches.

preprint2021arXiv

Sub-Architecture Ensemble Pruning in Neural Architecture Search

Neural architecture search (NAS) is gaining more and more attention in recent years due to its flexibility and remarkable capability to reduce the burden of neural network design. To achieve better performance, however, the searching process usually costs massive computations that might not be affordable for researchers and practitioners. While recent attempts have employed ensemble learning methods to mitigate the enormous computational cost, however, they neglect a key property of ensemble methods, namely diversity, which leads to collecting more similar sub-architectures with potential redundancy in the final design. To tackle this problem, we propose a pruning method for NAS ensembles called "Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP)." It targets to leverage diversity and to achieve sub-ensemble architectures at a smaller size with comparable performance to ensemble architectures that are not pruned. Three possible solutions are proposed to decide which sub-architectures to prune during the searching process. Experimental results exhibit the effectiveness of the proposed method by largely reducing the number of sub-architectures without degrading the performance.

preprint2020arXiv

A fully discrete energy stable scheme for a phase-field moving contact line model with variable densities and viscosities

In this work, we propose a fully discrete energy stable scheme for the phase-field moving contact line model with variable densities and viscosities. The mathematical model consists of a Cahn-Hilliard equation, a Navier-Stokes equation and the generalized Navier boundary condition for the moving contact line. A scalar auxiliary variable is adopted to transform the governing system into an equivalent form, allowing the double well potential to be treated semi-explicitly. A stabilization term is added to balance the explicit nonlinear term originating from the surface energy at fluid-solid interface. A pressure stabilization method is used to decouple the computation of velocity and pressure. Some subtle implicit-explicit treatments are adopted to deal with convention and stress terms. We establish a rigorous proof of energy stability for the proposed time-marching scheme. Then a finite difference method on staggered grids is used to spatially discretize the constructed time-marching scheme. We further prove that the fully discrete scheme also satisfies the discrete energy dissipation law. Numerical results demonstrate accuracy and energy stability of the proposed scheme. Using our numerical scheme, we analyze the contact line dynamics through a shear flow driven droplet sliding case. Three-dimensional droplet spreading is also investigated on a chemically patterned surface. Our numerical simulation accurately predicts the expected energy evolutions and it successfully reproduces expected phenomena that an oil droplet contracts inwards on a hydrophobic zone and spreads outwards quickly on a hydrophilic zone.

preprint2020arXiv

Correction of Faulty Background Knowledge based on Condition Aware and Revise Transformer for Question Answering

The study of question answering has received increasing attention in recent years. This work focuses on providing an answer that compatible with both user intent and conditioning information corresponding to the question, such as delivery status and stock information in e-commerce. However, these conditions may be wrong or incomplete in real-world applications. Although existing question answering systems have considered the external information, such as categorical attributes and triples in knowledge base, they all assume that the external information is correct and complete. To alleviate the effect of defective condition values, this paper proposes condition aware and revise Transformer (CAR-Transformer). CAR-Transformer (1) revises each condition value based on the whole conversation and original conditions values, and (2) it encodes the revised conditions and utilizes the conditions embedding to select an answer. Experimental results on a real-world customer service dataset demonstrate that the CAR-Transformer can still select an appropriate reply when conditions corresponding to the question exist wrong or missing values, and substantially outperforms baseline models on automatic and human evaluations. The proposed CAR-Transformer can be extended to other NLP tasks which need to consider conditioning information.

preprint2020arXiv

Highly transparent contacts to the 1D hole gas in ultra-scaled Ge/Si core/shell nanowires

Semiconductor-superconductor hybrid systems have outstanding potential for emerging high-performance nanoelectronics and quantum devices. However, critical to their successful application is the fabrication of high-quality and reproducible semiconductor-superconductor interfaces. Here, we realize and measure axial Al-Ge-Al nanowire heterostructures with atomically precise interfaces, enwrapped by an ultrathin epitaxial Si layer further denoted as Al-Ge/Si-Al nanowire heterostructures. The heterostructures were synthesized by a thermally induced exchange reaction of single-crystalline Ge/Si core/shell nanowires and lithographically defined Al contact pads. Applying this heterostructure formation scheme enables self-aligned quasi one-dimensional crystalline Al leads contacting ultrascaled Ge/Si segments with contact transparencies greater than 96%. Integration into back-gated field-effect devices and continuous scaling beyond lithographic limitations allows us to exploit the full potential of the highly transparent contacts to the 1D hole gas at the Ge-Si interface. This leads to the observation of ballistic transport as well as quantum confinement effects up to temperatures of 150 K. Low-temperature measurements reveal proximity-induced superconductivity in the Ge/Si core/shell nanowires. The realization of a Josephson field-effect transistor allows us to study the subgap structure caused by multiple Andreev reflections. Most importantly, the absence of a quantum dot regime indicates a hard superconducting gap originating from the highly transparent contacts to the 1D hole gas, which is potentially interesting for the study of Majorana zero modes. Moreover, underlining the importance of the proposed thermally induced Al-Ge/Si-Al heterostructure formation technique, our system could contribute to the development of key components of quantum computing such as gatemon or transmon qubits

preprint2020arXiv

Spin filtering in germanium/silicon core/shell nanowires with pseudo-helical gap

Semiconductors with strong spin-orbit interactions can exhibit a helical gap with spin-momentum locking opened by a magnetic field. Such a gap is highly spin selective as a result of a topologically protected spin-momentum locking, which can be used for spin filtering. We experimentally demonstrate such a spin filtering effect in a quasi-ballistic p-type germanium/silicon core/shell nanowire (NW), which possesses a pseudo-helical gap without the application of magnetic field. Polarized hole spin injection to the NW is achieved using cobalt ferromagnetic contacts with controlled natural surface oxide on the NW as a tunnel barrier. Local and nonlocal spin valve effects are measured as the verification of polarized spin transport in the NW outside the helical gap. We electrically tune the NW into the helical gap by scanning its chemical potential with a gate. A hysteresis loop with three resistance states is observed in the local spin valve geometry, as an evidence of spin filtering in the helical gap.

preprint2019arXiv

Numerical approximation of a phase-field surfactant model with fluid flow

Modelling interfacial dynamics with soluble surfactants in a multiphase system is a challenging task. Here, we consider the numerical approximation of a phase-field surfactant model with fluid flow. The nonlinearly coupled model consists of two Cahn-Hilliard-type equations and incompressible Navier-Stokes equation. With the introduction of two auxiliary variables, the governing system is transformed into an equivalent form, which allows the nonlinear potentials to be treated efficiently and semi-explicitly. By certain subtle explicit-implicit treatments to stress and convective terms, we construct first and second-order time marching schemes, which are extremely efficient and easy-to-implement, for the transformed governing system. At each time step, the schemes involve solving only a sequence of linear elliptic equations, and computations of phase-field variables, velocity and pressure are fully decoupled. We further establish a rigorous proof of unconditional energy stability for the first-order scheme. Numerical results in both two and three dimensions are obtained, which demonstrate that the proposed schemes are accurate, efficient and unconditionally energy stable. Using our schemes, we investigate the effect of surfactants on droplet deformation and collision under a shear flow, where the increase of surfactant concentration can enhance droplet deformation and inhibit droplet coalescence.

preprint2015arXiv

Hierarchical multiscale modeling for flows in fractured media using Generalized Multiscale Finite Element Method

In this paper, we develop a multiscale finite element method for solving flows in fractured media. Our approach is based on Generalized Multiscale Finite Element Method (GMsFEM), where we represent the fracture effects on a coarse grid via multiscale basis functions. These multiscale basis functions are constructed in the offline stage via local spectral problems following GMsFEM. To represent the fractures on the fine grid, we consider two approaches (1) Discrete Fracture Model (DFM) (2) Embedded Fracture Model (EFM) and their combination. In DFM, the fractures are resolved via the fine grid, while in EFM the fracture and the fine grid block interaction is represented as a source term. In the proposed multiscale method, additional multiscale basis functions are used to represent the long fractures, while short-size fractures are collectively represented by a single basis functions. The procedure is automatically done via local spectral problems. In this regard, our approach shares common concepts with several approaches proposed in the literature as we discuss. Numerical results are presented where we demonstrate how one can adaptively add basis functions in the regions of interest based on error indicators. We also discuss the use of randomized snapshots (\cite{randomized2014}) which reduces the offline computational cost.

preprint2014arXiv

Nanoscale simulation of shale transport properties using the lattice Boltzmann method: permeability and diffusivity

Porous structures of shales are reconstructed based on scanning electron microscopy (SEM) images of shale samples from Sichuan Basin, China. Characterization analyzes of the nanoscale reconstructed shales are performed, including porosity, pore size distribution, specific surface area and pore connectivity. The multiple-relaxation-time (MRT) lattice Boltzmann method (LBM) fluid flow model and single-relaxation-time (SRT) LBM diffusion model are adopted to simulate the fluid flow and Knudsen diffusion process within the reconstructed shales, respectively. Tortuosity, intrinsic permeability and effective Knudsen diffusivity are numerically predicted. The tortuosity is much higher than that commonly employed in Bruggeman equation. Correction of the intrinsic permeability by taking into consideration the contribution of Knudsen diffusion, which leads to the apparent permeability, is performed. The correction factor under different Knudsen number and pressure are estimated and compared with existing corrections reported in the literature. For the wide pressure range under investigation, the correction factor is always greater than 1, indicating the Knudsen diffusion always plays a role on the transport mechanisms of shale gas in shales studied in the present study. Most of the values of correction factor are located in the transition regime, with no Darcy flow regime observed.

preprint2013arXiv

Evolution of photo-excited carrier distribution from anisotropic to isotropic and isotropic photon absorption in graphene

Femtosecond time-resolved spectroscopy using 400 nm-pump and 800 nm-probe in CVD-grown multilayer graphene provides strong evidence for isotropic distribution of photoexcited carrier after initial relaxation. Indicative of such isotropic distribution is a pump polarization independence of differential reflectivity (\DeltaR/R) and transmittance (\DeltaT/T) from pump-probe measurements. Combined with results using 800 nm-pump in [arXiv. 1301.1743v3 (2013)], these pump polarization dependences of time-resolved spectroscopy corroborates the evolution of photo-excited carrier distribution from anisotropic to isotropic with carrier relaxation. And, the absorbance of graphene is identical for in-plane and out-of-plane optical fields. No matter the carrier distribution in momentum space, the influence of carrier on in-plane and out-of-plane optical fields from state filling effect is identical. The sign reversing of ps dynamics signal in graphene/graphite should not directly relate to carrier.

preprint2013arXiv

Experimental observation of polarization-dependent ultrafast carrier dynamics in multi-layer graphene

Polarization characteristic of ultrafast carrier dynamics in multi-layer CVD-grown graphene is probed with tilted beams (with respected to the graphene plane). The graphene ultrafast carrier dynamics measurement greatly depends on both polarization (i.e., orientation of linear polarization) and wave vector of probe beam. The differential reflectivity ΔR=R signal of picosecond dynamics could be continuously altered from positive to negative by changing the probe polarization from P to S when the dynamics is probed by a total internal reflected beam. The polarization dependent ΔR=R signal around 0 delay time is positive. It could be altered to negative by changing the probe polarization if the probe beam is non-total internal reflected beam. However, no sign reversal was observed for differential transmittance ΔT=T . These extremely unexpected results indicate the anisotropy of graphene carrier dynamics. Thus the ultrafast carrier dynamics should be further studied with consideration of the anisotropic structure (in- and out-of-graphene plane) of graphene.

preprint2012arXiv

Terahertz and Infrared Spectroscopy of Gated Large-Area Graphene

We have fabricated a centimeter-size single-layer graphene device, with a gate electrode, which can modulate the transmission of terahertz and infrared waves. Using time-domain terahertz spectroscopy and Fourier-transform infrared spectroscopy in a wide frequency range (10-10000 cm^{-1}), we measured the dynamic conductivity change induced by electrical gating and thermal annealing. Both methods were able to effectively tune the Fermi energy, E_F, which in turn modified the Drude-like intraband absorption in the terahertz as well as the '2E_F onset' for interband absorption in the mid-infrared. These results not only provide fundamental insight into the electromagnetic response of Dirac fermions in graphene but also demonstrate the key functionalities of large-area graphene devices that are desired for components in terahertz and infrared optoelectronics.

preprint2011arXiv

In Situ Imaging of the Conducting Filament in a Silicon Oxide Resistive Switch

The nature of the conducting filaments in many resistive switching systems has been elusive. Through in situ transmission electron microscopy, we image the real-time formation and evolution of the filament in a silicon oxide resistive switch. The electroforming process is revealed to involve the local enrichment of silicon from the silicon oxide matrix. Semi-metallic silicon nanocrystals with structural variations from the conventional diamond cubic form of silicon are observed, which likely accounts for the conduction in the filament. The growth and shrinkage of the silicon nanocrystals in response to different electrical stimuli show energetically viable transition processes in the silicon forms, offering evidence to the switching mechanism. The study here also provides insights into the electrical breakdown process in silicon oxide layers, which are ubiquitous in a host of electronic devices.

preprint2010arXiv

A High-confidence Cyber-Physical Alarm System: Design and Implementation

Most traditional alarm systems cannot address security threats in a satisfactory manner. To alleviate this problem, we developed a high-confidence cyber-physical alarm system (CPAS), a new kind of alarm systems. This system establishes the connection of the Internet (i.e. TCP/IP) through GPRS/CDMA/3G. It achieves mutual communication control among terminal equipments, human machine interfaces and users by using the existing mobile communication network. The CPAS will enable the transformation in alarm mode from traditional one-way alarm to two-way alarm. The system has been successfully applied in practice. The results show that the CPAS could avoid false alarms and satisfy residents' security needs.

Jun Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

Generation of optical vortices imitating water vortices

Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for Portfolio Optimization and Order Execution

Sub-Architecture Ensemble Pruning in Neural Architecture Search

A fully discrete energy stable scheme for a phase-field moving contact line model with variable densities and viscosities

Correction of Faulty Background Knowledge based on Condition Aware and Revise Transformer for Question Answering

Highly transparent contacts to the 1D hole gas in ultra-scaled Ge/Si core/shell nanowires

Spin filtering in germanium/silicon core/shell nanowires with pseudo-helical gap

Numerical approximation of a phase-field surfactant model with fluid flow

Hierarchical multiscale modeling for flows in fractured media using Generalized Multiscale Finite Element Method

Nanoscale simulation of shale transport properties using the lattice Boltzmann method: permeability and diffusivity

Evolution of photo-excited carrier distribution from anisotropic to isotropic and isotropic photon absorption in graphene

Experimental observation of polarization-dependent ultrafast carrier dynamics in multi-layer graphene

Terahertz and Infrared Spectroscopy of Gated Large-Area Graphene

In Situ Imaging of the Conducting Filament in a Silicon Oxide Resistive Switch

A High-confidence Cyber-Physical Alarm System: Design and Implementation