Source author record

Qi Sun

Qi Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

26works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Chemistry in a Cryogenic Buffer Gas Cell

Cryogenic buffer gas sources are ubiquitous for producing cold, collimated molecular beams for quantum science, chemistry, and precision measurements. The molecules are typically produced by laser ablating a metal target in the presence of a donor gas. The radical of interest emerges due to a barrier-free reaction or under thermal or optical excitation. High-barrier reactions, such as between Ca and H$_2$, should be precluded. We study chemical reactions between Ca and three hydrogen isotopologues H$_2$, D$_2$, and HD in a cryogenic cell with helium buffer gas. We observe that H$_2$ can serve as both a reactant and a buffer gas, outperforming D$_2$ and HD. We use a reaction network model to describe the chemical dynamics and find that the enhanced molecular yield can be attributed to rapid vibrational excitations of the reactant gas. Our results demonstrate a robust method for generating bright cold beams of alkaline-earth-metal hydrides for laser cooling and trapping.

preprint2026arXiv

Neural Green's Function Accelerated Iterative Methods for Solving Indefinite Boundary Value Problems

Neural operators, which learn mappings between the function spaces, have been applied to solve boundary value problems in various ways, including learning mappings from the space of the forcing terms to the space of the solutions with the substantial requirements of data pairs. In this work, we present a data-free neural operator integrated with physics, which learns the Green kernel directly. Our method proceeds in three steps: 1. The governing equations for the Green's function are reformulated into an interface problem, where the delta Dirac function is removed; 2. The interface problem is embedded in a lifted space of higher-dimension to handle the jump in the derivative, but still solved on a two-dimensional surface without additional sampling cost; 3. Deep neural networks are employed to address the curse of dimensionality caused by this lifting operation. The approximate Green's function obtained through our approach is then used to construct preconditioners for the linear systems allowed by its mathematical properties. Furthermore, the spectral bias of it revealed through both theoretical analysis and numerical validation contrasts with the smoothing effects of traditional iterative solvers, which motivates us to propose a hybrid iterative method that combines these two solvers. Numerical experiments demonstrate the effectiveness of our approximate Green's function in accelerating iterative methods, proving fast convergence for solving indefinite problems even involving discontinuous coefficients.

preprint2026arXiv

The ${\cal N}=1$ supersymmetric Pati-Salam models with extra $SU(2)_{L_2/R_2}$ gauge symmetry from intersecting D6-branes

By introducing an extra stack of D6-branes to standard ${\cal N}=1$ supersymmetric Pati-Salam models, we extend the landscape of its complete search. In this construction, the $d$-stack of D6-branes is introduced besides the standard $a,~b,~c$-stacks. More intersections from the extra stacks of D6-branes appear, and thus Higgs/Higgs-like particles arise from more origins. Among these models, we find eight new classes of ${\cal N}=1$ supersymmetric Pati-Salam models with gauge symmetries $SU(4)_C\times SU(2)_L\times SU(2)_{R_1}\times SU(2)_{R_2}$ and $SU(4)_C\times SU(2)_{L_1}\times SU(2)_{R}\times SU(2)_{L_2}$, where $d$-stack of D6-branes carries the gauge symmetries $SU(2)_{R_2}$ and $SU(2)_{L_2}$, respectively. The $SU(2)_{L_1/R_1} \times SU(2)_{L_2/R_2}$ can be broken down to the diagonal $SU(2)_{L/R}$ gauge symmetry via bifundamental Higgs fields. In such a way, we for the first time successfully constructed three-family supersymmetric Pati-Salam models from non-rigid D6-branes with extra $d$-stacks of D6-branes as visible sectors. Interestingly, by introducing extra stack of D6-branes to the standard supersymmetric Pati-Salam models, the number of filler brane reduces in general, and eventually the models without any $USp(N)$ gauge symmetry present. This reduces the exotic particles from filler brane intersection yet provides more vector-like particles from ${\cal N}=2$ subsector that are useful in renormalization group equation evolution as an advantage. Moreover, interesting degeneracy behavior with the same gauge coupling ratio exists in certain class of models.

preprint2024arXiv

The Molecular Characterizations of Variable Triebel-Lizorkin Spaces Associated with the Hermite Operator and Its Applications

In this article, we introduce inhomogeneous variable Triebel-Lizorkin spaces, $F_{p(\cdot),q(\cdot)}^{α(\cdot),H}(\mathbb R^n)$, associated with the Hermite operator $H:=-Δ+|x|^2$, where $Δ$ is the Laplace operator on $\mathbb R^n$, and mainly establish the molecular characterization of this space. As applications, we obtain some regularity results to fractional Hermite equations $$(-Δ+|x|^2)^σu=f,\quad (-Δ+|x|^2+I)^σu=f,$$ and the boundedness of spectral multiplier associated to the operator $H$ on the variable Triebel-Lizorkin space $F_{p(\cdot),q(\cdot)}^{α(\cdot),H}(\mathbb R^n)$. Furthermore, we explain the relationship between $F_{p(\cdot),q(\cdot)}^{α(\cdot),H}(\mathbb R^n)$ and the variable Triebel-Lizorkin spaces $F_{p(\cdot),q(\cdot)}^{α(\cdot)}(\mathbb R^n)$ (introduced in Diening t al. J. Funct. Anal. 256(2009), 1731-1768.) via the atomic decomposition.

preprint2022arXiv

A Chit-Chats Enhanced Task-Oriented Dialogue Corpora for Fuse-Motive Conversation Systems

The goal of building intelligent dialogue systems has largely been separately pursued under two motives: task-oriented dialogue (TOD) systems, and open-domain systems for chit-chat (CC). Although previous TOD dialogue systems work well in the testing sets of benchmarks, they would lead to undesirable failure when being exposed to natural scenarios in practice, where user utterances can be of high motive-diversity that fusing both TOD and CC in multi-turn interaction. Since an industrial TOD system should be able to converse with the user between TOD and CC motives, constructing a fuse-motive dialogue dataset that contains both TOD or CC is important. Most prior work relies on crowd workers to collect and annotate large scale dataset and is restricted to English language setting. Our work, on the contrary, addresses this problem in a more effective way and releases a multi-turn dialogues dataset called CCET (Chinese Chat-Enhanced-Task). Meanwhile, we also propose a line of fuse-motive dialogues formalization approach, along with several evaluation metrics for TOD sessions that are integrated by CC utterances.

preprint2022arXiv

Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.

preprint2022arXiv

Characterizations of variable fractional Hajłasz-Sobolev spaces

Let (X, \r{ho},μ) be a space of homogeneous type, a variable exponent satisfying the globally log-Holder continuous condition. In this article, the author introduce the variable fractional Sobolev spaces on X via Hajłasz gradient. Using various maximal functions, several characterizations of this space are established.

preprint2022arXiv

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

preprint2022arXiv

FoV-NeRF: Foveated Neural Radiance Fields for Virtual Reality

Virtual Reality (VR) is becoming ubiquitous with the rise of consumer displays and commercial VR platforms. Such displays require low latency and high quality rendering of synthetic imagery with reduced compute overheads. Recent advances in neural rendering showed promise of unlocking new possibilities in 3D computer graphics via image-based representations of virtual or physical environments. Specifically, the neural radiance fields (NeRF) demonstrated that photo-realistic quality and continuous view changes of 3D scenes can be achieved without loss of view-dependent effects. While NeRF can significantly benefit rendering for VR applications, it faces unique challenges posed by high field-of-view, high resolution, and stereoscopic/egocentric viewing, typically causing low quality and high latency of the rendered images. In VR, this not only harms the interaction experience but may also cause sickness. To tackle these problems toward six-degrees-of-freedom, egocentric, and stereo NeRF in VR, we present the first gaze-contingent 3D neural representation and view synthesis method. We incorporate the human psychophysics of visual- and stereo-acuity into an egocentric neural representation of 3D scenery. We then jointly optimize the latency/performance and visual quality while mutually bridging human perception and neural scene synthesis to achieve perceptually high-quality immersive interaction. We conducted both objective analysis and subjective studies to evaluate the effectiveness of our approach. We find that our method significantly reduces latency (up to 99% time reduction compared with NeRF) without loss of high-fidelity rendering (perceptually identical to full-resolution ground truth). The presented approach may serve as the first step toward future VR/AR systems that capture, teleport, and visualize remote environments in real-time.

preprint2022arXiv

Image Features Influence Reaction Time: A Learned Probabilistic Perceptual Model for Saccade Latency

We aim to ask and answer an essential question "how quickly do we react after observing a displayed visual target?" To this end, we present psychophysical studies that characterize the remarkable disconnect between human saccadic behaviors and spatial visual acuity. Building on the results of our studies, we develop a perceptual model to predict temporal gaze behavior, particularly saccadic latency, as a function of the statistics of a displayed image. Specifically, we implement a neurologically-inspired probabilistic model that mimics the accumulation of confidence that leads to a perceptual decision. We validate our model with a series of objective measurements and user studies using an eye-tracked VR display. The results demonstrate that our model prediction is in statistical alignment with real-world human behavior. Further, we establish that many sub-threshold image modifications commonly introduced in graphics pipelines may significantly alter human reaction timing, even if the differences are visually undetectable. Finally, we show that our model can serve as a metric to predict and alter reaction latency of users in interactive computer graphics applications, thus may improve gaze-contingent rendering, design of virtual experiences, and player performance in e-sports. We illustrate this with two examples: estimating competition fairness in a video game with two different team colors, and tuning display viewing distance to minimize player reaction time.

preprint2022arXiv

Instant Reality: Gaze-Contingent Perceptual Optimization for 3D Virtual Reality Streaming

Media streaming has been adopted for a variety of applications such as entertainment, visualization, and design. Unlike video/audio streaming where the content is usually consumed sequentially, 3D applications such as gaming require streaming 3D assets to facilitate client-side interactions such as object manipulation and viewpoint movement. Compared to audio and video streaming, 3D streaming often requires larger data sizes and yet lower latency to ensure sufficient rendering quality, resolution, and latency for perceptual comfort. Thus, streaming 3D assets can be even more challenging than streaming audios/videos, and existing solutions often suffer from long loading time or limited quality. To address this critical and timely issue, we propose a perceptually-optimized progressive 3D streaming method for spatial quality and temporal consistency in immersive interactions. Based on the human visual mechanisms in the frequency domain, our model selects and schedules the streaming dataset for optimal spatial-temporal quality. We also train a neural network for our model to accelerate this decision process for real-time client-server applications. We evaluate our method via subjective studies and objective analysis under varying network conditions (from 3G to 5G) and client devices (HMD and traditional displays), and demonstrate better visual quality and temporal consistency than alternative solutions.

preprint2021arXiv

A Practical Layer-Parallel Training Algorithm for Residual Networks

Gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters, which are time-consuming for deep ResNets. To break the dependencies between modules in both the forward and backward modes, auxiliary-variable methods such as the penalty and augmented Lagrangian (AL) approaches have attracted much interest lately due to their ability to exploit layer-wise parallelism. However, we observe that large communication overhead and lacking data augmentation are two key challenges of these methods, which may lead to low speedup ratio and accuracy drop across multiple compute devices. Inspired by the optimal control formulation of ResNets, we propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost. The proposed strategy first trains the network parameters by solving a succession of independent sub-problems in parallel and then corrects the network parameters through a full serial forward-backward propagation of data. Such a strategy can be applied to most of the existing layer-parallel training methods using auxiliary variables. As an example, we validate the proposed strategy using penalty and AL methods on ResNet and WideResNet across MNIST, CIFAR-10 and CIFAR-100 datasets, achieving significant speedup over the traditional layer-serial training methods while maintaining comparable accuracy.

preprint2021arXiv

An unfitted finite element method for two-phase Stokes problems with slip between phases

We present an isoparametric unfitted finite element approach of the CutFEM or Nitsche-XFEM family for the simulation of two-phase Stokes problems with slip between phases. For the unfitted generalized Taylor--Hood finite element pair $\mathbf{P}_{k+1}-P_k$, $k\ge1$, we show an inf-sup stability property with a stability constant that is independent of the viscosity ratio, slip coefficient, position of the interface with respect to the background mesh and, of course, mesh size. In addition, we prove stability and optimal error estimates that follow from this inf-sup property. We provide numerical results in two and three dimensions to corroborate the theoretical findings and demonstrate the robustness of our approach with respect to the contrast in viscosity, slip coefficient value, and position of the interface relative to the fixed computational mesh.

preprint2021arXiv

Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Merging into the highway from the on-ramp is an essential scenario for automated driving. The decision-making under the scenario needs to balance the safety and efficiency performance to optimize a long-term objective, which is challenging due to the dynamic, stochastic, and adversarial characteristics. The Rule-based methods often lead to conservative driving on this task while the learning-based methods have difficulties meeting the safety requirements. In this paper, we propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC). The SDSAC adopts the policy evaluation with safety consideration and a safety shield parameterized with the barrier function in its offline training and online correction, respectively. These two measures support each other for better safety while not damaging the efficiency performance severely. We verify the SDSAC on an on-ramp merge scenario in simulation. The results show that the SDSAC has the best safety performance compared to baseline algorithms and achieves efficient driving simultaneously.

preprint2021arXiv

Recurrent Model Predictive Control

This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The number of prediction steps is equal to the number of recurrent cycles of the learned policy function. With an arbitrary initial policy function, the proposed RMPC algorithm can converge to the optimal policy by directly minimizing the designed loss function. We further prove the convergence and optimality of the RMPC algorithm thorough Bellman optimality principle, and demonstrate its generality and efficiency using two numerical examples.

preprint2021arXiv

Steadily Learn to Drive with Virtual Memory

Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for high-dimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems. LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience. Various imagined latent trajectories are generated as virtual memory by the latent dynamic model. The policy is learned by propagating gradient through the learned latent model with the imagined latent trajectories and thus leads to high data efficiency. Furthermore, a double critic structure is designed to reduce the oscillation during the training process. The effectiveness of LVM is demonstrated by an image-input autonomous driving task, in which LVM outperforms the existing method in terms of data efficiency, learning stability, and control performance.

preprint2020arXiv

Centralized Coordination of Connected Vehicles at Intersections using Graphical Mixed Integer Optimization

This paper proposes a centralized multi-vehicle coordination scheme serving unsignalized intersections. The whole process consists of three stages: a) target velocity optimization: formulate the collision-free vehicle coordination as a Mixed Integer Linear Programming (MILP) problem, with each incoming lane representing an independent variable; b) dynamic vehicle selection: build a directed graph with result of the optimization, and reserve only some of the vehicle nodes to coordinate by applying a subset extraction algorithm; c) synchronous velocity profile planning: bridge the gap between current speed and optimal velocity in a synchronous manner. The problem size is essentially bounded by number of lanes instead of vehicles. Thus the optimization process is realtime with guaranteed solution quality. Simulation has verified efficiency and real-time performance of the scheme.

preprint2020arXiv

Deep Multi Depth Panoramas for View Synthesis

We propose a learning-based approach for novel view synthesis for multi-camera 360$^{\circ}$ panorama capture rigs. Previous work constructs RGBD panoramas from such data, allowing for view synthesis with small amounts of translation, but cannot handle the disocclusions and view-dependent effects that are caused by large translations. To address this issue, we present a novel scene representation - Multi Depth Panorama (MDP) - that consists of multiple RGBD$α$ panoramas that represent both scene geometry and appearance. We demonstrate a deep neural network-based method to reconstruct MDPs from multi-camera 360$^{\circ}$ images. MDPs are more compact than previous 3D scene representations and enable high-quality, efficient new view rendering. We demonstrate this via experiments on both synthetic and real data and comparisons with previous state-of-the-art methods spanning both learning-based approaches and classical RGBD-based methods.

preprint2020arXiv

DiffTaichi: Differentiable Programming for Physical Simulation

We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators. Based on an imperative programming language, DiffTaichi generates gradients of simulation steps using source code transformations that preserve arithmetic intensity and parallelism. A light-weight tape is used to record the whole simulation program structure and replay the gradient kernels in a reversed order, for end-to-end backpropagation. We demonstrate the performance and productivity of our language in gradient-based learning and optimization tasks on 10 different physical simulators. For example, a differentiable elastic object simulator written in our language is 4.2x shorter than the hand-engineered CUDA version yet runs as fast, and is 188x faster than the TensorFlow implementation. Using our differentiable programs, neural network controllers are typically optimized within only tens of iterations.

preprint2020arXiv

Integral-Type Event-Triggered Model Predictive Control of Nonlinear Systems with Additive Disturbance

This paper studies integral-type event-triggered model predictive control (MPC) of continuous-time nonlinear systems. An integral-type event-triggered mechanism is proposed by incorporating the integral of errors between the actual and predicted state sequences, leading to reduced average sampling frequency. Besides, a new and improved robustness constraint is introduced to handle the additive disturbance, rendering the MPC problem with a potentially enlarged initial feasible region. Furthermore, the feasibility of the designed MPC and the stability of the closed-loop system are rigorously investigated. Several sufficient conditions to guarantee these properties are established, which is related to factors such as the prediction horizon, the disturbance bound, the triggering level, and the contraction rate for the robustness constraint. The effectiveness of the proposed algorithm is illustrated by numerical examples and comparisons.

preprint2020arXiv

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency. This paper presents a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy with the purpose of improving both learning accuracy and training speed. The dual representations indicate the environmental model and the state-action data: the former can accelerate the learning process of RL, while its inherent model uncertainty generally leads to worse policy accuracy than the latter, which comes from direct measurements of states and actions. In the framework design of the mixed RL, the compensation of the additive stochastic model uncertainty is embedded inside the policy iteration RL framework by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The convergence of the mixed RL is proved using the Bellman's principle of optimality, and the recursive stability of the generated policy is proved via the Lyapunov's direct method. The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).

preprint2016arXiv

Application of Non-orthogonal Multiple Access in LTE and 5G Networks

As the latest member of the multiple access family, non-orthogonal multiple access (NOMA) has been recently proposed for 3GPP Long Term Evolution (LTE) and envisioned to be an essential component of 5th generation (5G) mobile networks. The key feature of NOMA is to serve multiple users at the same time/frequency/code, but with different power levels, which yields a significant spectral efficiency gain over conventional orthogonal MA. This article provides a systematic treatment of this newly emerging technology, from its combination with multiple-input multiple-output (MIMO) technologies, to cooperative NOMA, as well as the interplay between NOMA and cognitive radio. This article also reviews the state of the art in the standardization activities concerning the implementation of NOMA in LTE and 5G networks.

preprint2016arXiv

Multilevel Monte Carlo Finite Element Method for A Stochastic Optimal Control Problem

In this paper, we consider the implementation of multi-level Monte Carlo method to a stochastic optimal control problem with log-normal coefficients and its surrogate model problem. From the perspective of two optimization problems, i.e., minimizing the accuracy using a fixed computational cost and minimizing the total computational cost to attain a given accuracy, we derive formulas to determine the optimal sample sizes for each level of multi-level Monte Carlo method. Furthermore, we put forward the multi-level Monte Carlo algorithm for our stochastic optimal control problem and some tricks to deal with the multi-level log-normal coefficients. Finally, we present the numerical results of both the elliptic SPDEs and our control problem to validate the effectiveness over the traditional Monte Carlo method.

preprint2015arXiv

Full Duplex Networking: Mission Impossible?

Mobile traffic is projected to increase 1000 times from 2010 to 2020. This poses significant challenges on the 5th generation (5G) wireless communication system design, including network structure, air interface, key transmission schemes, multiple access, and duplexing schemes. In this paper, full duplex networking issues are discussed, aiming to provide some insights on the design and possible future deployment for 5G. Particularly, the interference scenarios in full duplex are analyzed, followed by discussions on several candidate interference mitigation approaches, interference proof frame structures, transceiver structures for channel reciprocity recovery, and super full duplex base station where each sector operates in time division duplex (TDD) mode. The extension of TDD and frequency division duplex (FDD) to full duplex is also examined. It is anticipated that with future standardization and deployment of full duplex systems, TDD and FDD will be harmoniously integrated, supporting all the existing half duplex mobile phones efficiently, and leading to a substantially enhanced 5G system performance.

preprint2015arXiv

Ultra-fast Multiple Genome Sequence Matching Using GPU

In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%~30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearly outperform suffix tree using GPU. Consequently, the experimental results show that multiple genome sequence matching based on suffix array is more than 99 times speedup than that of CPU serial implementation. There is no doubt that massively parallel matching algorithm based on suffix array is an efficient approach to high-performance bioinformatics applications.

preprint2013arXiv

Optimizing Synchronization Algorithm for Auto-parallelizing Compiler

In this paper, we focus on the need for two approaches to optimize producer and consumer synchronization for auto-parallelizing compiler. Emphasis is placed on the construction of a criterion model by which the compiler reduce the number of synchronization operations needed to synchronize the dependence in a loop and perform optimization reduces the overhead of enforcing all dependence. In accordance with our study, we transform to modify and eliminate dependence on iteration space diagram (ISD), and carry out the problems of acyclic and cyclic dependence in detail. we eliminate partial dependence and optimize the synchronize instructions. Some didactic examples are included to illustrate the optimize procedure.

Qi Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Chemistry in a Cryogenic Buffer Gas Cell

Neural Green's Function Accelerated Iterative Methods for Solving Indefinite Boundary Value Problems

The ${\cal N}=1$ supersymmetric Pati-Salam models with extra $SU(2)_{L_2/R_2}$ gauge symmetry from intersecting D6-branes

The Molecular Characterizations of Variable Triebel-Lizorkin Spaces Associated with the Hermite Operator and Its Applications

A Chit-Chats Enhanced Task-Oriented Dialogue Corpora for Fuse-Motive Conversation Systems

Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

Characterizations of variable fractional Hajłasz-Sobolev spaces

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

FoV-NeRF: Foveated Neural Radiance Fields for Virtual Reality

Image Features Influence Reaction Time: A Learned Probabilistic Perceptual Model for Saccade Latency

Instant Reality: Gaze-Contingent Perceptual Optimization for 3D Virtual Reality Streaming

A Practical Layer-Parallel Training Algorithm for Residual Networks

An unfitted finite element method for two-phase Stokes problems with slip between phases

Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Recurrent Model Predictive Control

Steadily Learn to Drive with Virtual Memory

Centralized Coordination of Connected Vehicles at Intersections using Graphical Mixed Integer Optimization

Deep Multi Depth Panoramas for View Synthesis

DiffTaichi: Differentiable Programming for Physical Simulation

Integral-Type Event-Triggered Model Predictive Control of Nonlinear Systems with Additive Disturbance

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Application of Non-orthogonal Multiple Access in LTE and 5G Networks

Multilevel Monte Carlo Finite Element Method for A Stochastic Optimal Control Problem

Full Duplex Networking: Mission Impossible?

Ultra-fast Multiple Genome Sequence Matching Using GPU

Optimizing Synchronization Algorithm for Auto-parallelizing Compiler