Source author record

Tao Yao

Tao Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC hep-ph math.AP nucl-th Artificial Intelligence Computer Science and Game Theory hep-ex hep-lat hep-th Machine Learning math.DS math.ST nucl-ex Statistics Theory

Catalog footprint

What is connected

20works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimization, which treats each transformer block as a single layer and optimizes it sequentially, while freezing the other layers to save optimizer states and activations. Although effective, these methods ignore the varying importance of the modules within each layer, leading to suboptimal performance. Moreover, layer-wise sampling provides only limited memory savings, as at least one full layer must remain active during optimization. To overcome these limitations, we propose Module-wise Importance SAmpling (MISA), a novel method that divides each layer into smaller modules and assigns importance scores to each module. MISA uses a weighted random sampling mechanism to activate modules, provably reducing gradient variance compared to layer-wise sampling. Additionally, we establish an $\mathcal{O}(1/\sqrt{K})$ convergence rate under non-convex and stochastic conditions, where $K$ is the total number of block updates, and provide a detailed memory analysis showcasing MISA's superiority over existing baseline methods. Experiments on diverse learning tasks validate the effectiveness of MISA. Source code is available at https://github.com/pkumelon/MISA.

preprint2020arXiv

Spectrum and rearrangement decays of tetraquark states with four different flavors

We have systematically investigated the mass spectrum and rearrangement decay properties of the exotic tetraquark states with four different flavors using a color-magnetic interaction model. Their masses are estimated by assuming that the $X(4140)$ is a $cs\bar{c}\bar{s}$ tetraquark state and their decay widths are obtained by assuming that the Hamiltonian for decay is a constant. According to the adopted method, we find that the most stable states are probably the isoscalar $bs\bar{u}\bar{d}$ and $cs\bar{u}\bar{d}$ with $J^P=0^+$ and $1^+$. The width for most unstable tetraquarks is about tens of MeVs, but that for unstable $cu\bar{s}\bar{d}$ and $cs\bar{u}\bar{d}$ can be around 100 MeV. For the $X(5568)$, our method cannot give consistent mass and width if it is a $bu\bar{s}\bar{d}$ tetraquark state. For the $I(J^P)=0(0^+),0(1^+)$ double-heavy $T_{bc}=bc\bar{u}\bar{d}$ states, their widths can be several MeVs.

preprint2019arXiv

Exclusive Production Ratio of Neutral over Charged Kaon Pair in $e^+e^-$ Annihilation Continuum via `Straton Model'

A completely relativistic quark model in the Bethe-Salpter framework is employed to calculate the exclusive production ratio of the neutral over charged Kaon pair in $e^+e^-$ annihilation continuum region for center of mass energies smaller than the $J/Ψ$ mass. The valence quark charge plays the key rôle. The cancellation of the diagrams for the same charge case (in $K_S + K_L$) and the non-cancellation of the diagrams for the different charge case (in $K^-+K^+$) lead to the ratio as $(m_s-m_d)^2/M_{Kaon}^2 \sim 1/10$.

preprint2016arXiv

A robust optimization approach for dynamic traffic signal control with emission considerations

We consider an analytical signal control problem on a signalized network whose traffic flow dynamic is described by the Lighthill-Whitham-Richards (LWR) model (Lighthill and Whitham, 1955; Richards, 1956). This problem explicitly addresses traffic-derived emissions as side constraints. We seek to tackle this problem using a mixed integer mathematical programming approach. Such a class of problems, which we call LWR-Emission (LWR-E), has been analyzed before to certain extent. Since mixed integer programs are practically efficient to solve in many cases (Bertsimas et al., 2011b), the mere fact of having integer variables is not the most significant challenge to solving LWR-E problems; rather, it is the presence of the potentially nonlinear and nonconvex emission-related constraints/objectives that render the program computationally expensive. To address this computational challenge, we proposed a novel reformulation of the LWR-E problem as a mixed integer linear program (MILP). This approach relies on the existence of a statistically valid macroscopic relationship between the aggregate emission rate and the vehicle occupancy of the same link. This relationship is approximated with certain functional forms and the associated uncertainties are handled explicitly using robust optimization (RO) techniques. The RO allows emissions-related constraints and/or objectives to be reformulated as linear forms under mild conditions. To further reduce the computational cost, we employ the link transmission model to describe traffic dynamics with the benefit of fewer (integer) variables and less potential traffic holding. The proposed MILP explicitly captures vehicle spillback, avoids traffic holding, and simultaneously minimizes travel delay and addresses emission-related concerns.

preprint2016arXiv

A variational approach for continuous supply chain networks

We consider a continuous supply chain network consisting of buffering queues and processors first proposed by [D. Armbruster, P. Degond, and C. Ringhofer, SIAM J. Appl. Math., 66 (2006), pp. 896--920] and subsequently analyzed by [D. Armbruster, P. Degond, and C. Ringhofer, Bull. Inst. Math. Acad. Sin. (N.S.), 2 (2007), pp. 433--460] and [D. Armbruster, C. De Beer, M. Freitag, T. Jagalski, and C. Ringhofer, Phys. A, 363 (2006), pp. 104--114]. A model was proposed for such a network by [S. Göttlich, M. Herty, and A. Klar, Commun. Math. Sci., 3 (2005), pp. 545--559] using a system of coupling ordinary differential equations and partial differential equations. In this article, we propose an alternative approach based on a variational method to formulate the network dynamics. We also derive, based on the variational method, a computational algorithm that guarantees numerical stability, allows for rigorous error estimates, and facilitates efficient computations. A class of network flow optimization problems are formulated as mixed integer programs (MIPs). The proposed numerical algorithm and the corresponding MIP are compared theoretically and numerically with existing ones [A. Fügenschuh, S. Göttlich, M. Herty, A. Klar, and A. Martin, SIAM J. Sci. Comput., 30 (2008), pp. 1490--1507; S. Göttlich, M. Herty, and A. Klar, Commun. Math. Sci., 3 (2005), pp. 545--559], which demonstrates the modeling and computational advantages of the variational approach.

preprint2016arXiv

Existence of simultaneous route and departure choice dynamic user equilibrium

This paper is concerned with the existence of the simultaneous route-and-departure choice dynamic user equilibrium (SRDC-DUE) in continuous time, first formulated as an infinite-dimensional variational inequality in Friesz et al. (1993). In deriving our existence result, we employ the generalized Vickrey model (GVM) introduced in and to formulate the underlying network loading problem. As we explain, the GVM corresponds to a path delay operator that is provably strongly continuous on the Hilbert space of interest. Finally, we provide the desired SRDC-DUE existence result for general constraints relating path flows to a table of fixed trip volumes without invocation of a priori bounds on the path flows.

preprint2016arXiv

Global solutions to folded concave penalized nonconvex learning

This paper is concerned with solving nonconvex learning problems with folded concave penalty. Despite that their global solutions entail desirable statistical properties, they lack optimization techniques that guarantee global optimality in a general setting. In this paper, we show that a class of nonconvex learning problems are equivalent to general quadratic programs. This equivalence facilitates us in developing mixed integer linear programming reformulations, which admit finite algorithms that find a provably global optimal solution. We refer to this reformulation-based technique as the mixed integer programming-based global optimization (MIPGO). To our knowledge, this is the first global optimization scheme with a theoretical guarantee for folded concave penalized nonconvex learning with the SCAD penalty [J. Amer. Statist. Assoc. 96 (2001) 1348-1360] and the MCP penalty [Ann. Statist. 38 (2001) 894-942]. Numerical results indicate a significant outperformance of MIPGO over the state-of-the-art solution scheme, local linear approximation and other alternative solution techniques in literature in terms of solution quality.

preprint2016arXiv

Network User Equilibrium with Elastic Demand: Formulation, Qualitative Analysis and Computation

In this paper we present a differential variational inequality formulation of dynamic network user equilibrium with elastic travel demand. We discuss its qualitative properties and provide algorithms for and examples of its solution.

preprint2016arXiv

On the Continuum Approximation of the On-and-off Signal Control on Dynamic Traffic Networks

In the modeling of traffic networks, a signalized junction is typically treated using a binary variable to model the on-and-off nature of signal operation. While accurate, the use of binary variables can cause problems when studying large networks with many intersections. Instead, the signal control can be approximated through a continuum approach where the on-and-off control variable is replaced by a priority parameter. Advantages of such approximation include elimination of the need for binary variables, lower time resolution requirements, and more flexibility and robustness in a decision environment. It also resolves the issue of discontinuous travel time functions arising from the context of dynamic traffic assignment. Despite these advantages in application, it is not clear from a theoretical point of view how accurate is such continuum approach; i.e., to what extent is this a valid approximation for the on-and-off case. The goal of this paper is to answer these basic research questions and provide further guidance for the application of such continuum signal model. In particular, by employing the Lighthill-Whitham-Richards model (Lighthill and Whitham, 1955; Richards, 1956) on a traffic network, we investigate the convergence of the on-and-off signal model to the continuum model in regimes of diminishing signal cycles. We also provide numerical analyses on the continuum approximation error when the signal cycles are not infinitesimal. As we explain, such convergence results and error estimates depend on the type of fundamental diagram assumed and whether or not vehicle spillback occurs in a network. Finally, a traffic signal optimization problem is presented and solved which illustrates the unique advantages of applying the continuum signal model instead of the on-and-off one.

preprint2016arXiv

Second-order models and traffic data from mobile sensors

Mobile sensing enabled by GPS or smart phones has become an increasingly important source of traffic data. For sufficient coverage of the traffic stream, it is important to maintain a reasonable penetration rate of probe vehicles. From the standpoint of capturing higher-order traffic quantities such as acceleration/deceleration, emission and fuel consumption rates, it is desirable to examine the impact on the estimation accuracy of sampling frequency on vehicle position. Of the two issues raised above, the latter is rarely studied in the literature. This paper addresses the impact of both sampling frequency and penetration rate on mobile sensing of highway traffic. To capture inhomogeneous driving conditions and deviation of traffic from the equilibrium state, we employ the second-order phase transition model (PTM). Several data fusion schemes that incorporate vehicle trajectory data into the PTM are proposed. And, a case study of the NGSIM dataset is presented which shows the estimation results of various Eulerian and Lagrangian traffic quantities. The findings show that while first-order traffic quantities can be accurately estimated even with a low sampling frequency, higher-order traffic quantities, such as acceleration, deviation, and emission rate, tend to be misinterpreted due to insufficiently sampled vehicle locations. We also show that a correction factor approach has the potential to reduce the sensing error arising from low sampling frequency and penetration rate, making the estimation of higher-order quantities more robust against insufficient data coverage of the highway traffic.

preprint2014arXiv

Colour connections of four quark $Q\bar{Q}Q'\bar{Q}'$ system and doubly heavy baryon production in $e^{+}e^{-}$ annihilation

The hadronization effects induced by various colour connections of the four quark system in $e^{+}e^{-}$ annihilation are briefly reviewed. A special colour connection of four heavy quark system without colour-separation favours the production of doubly heavy baryons. For the related three-jet case, the corresponding hadronization has not been considered. We argue that it can be effectively described as two string fragmentation besides the leading heavy diquark fragmentation. The production rate and the properties of final state hadron systems are discussed. Emphasis is on the string effect as a finger print of this hadronization procedure.

preprint2014arXiv

Search for doubly charmed hadron at B factory

The doubly charmed hadron production at B factories is of special importance for the study of the hadron structure and the color connections before hadronization. To suppress the combination background fluctuations of the reconstructed hadron mass spectra, we suggest a three-jet event shape trigger. After these three jets are identified by their energy and angular distributions, it is found that: 1) The background process $e^+ e^- \to c\bar {c} \to h's$ in consideration of the final hadron system $Λ_c^+K^-π^+ +X$ are significantly suppressed. 2) For the selected events, about half of the particles, $Λ_c^+$, $K^-$, $π^+$, which obviously can not belong to the decay products of doubly charmed hadron, can be vetoed. The relevant hadronization is investigated.

preprint2013arXiv

Dynamic Congestion and Tolls with Mobile Source Emission

This paper proposes a dynamic congestion pricing model that takes into account mobile source emissions. We consider a tollable vehicular network where the users selfishly minimize their own travel costs, including travel time, early/late arrival penalties and tolls. On top of that, we assume that part of the network can be tolled by a central authority, whose objective is to minimize both total travel costs of road users and total emission on a network-wide level. The model is formulated as a mathematical program with equilibrium constraints (MPEC) problem and then reformulated as a mathematical program with complementarity constraints (MPCC). The MPCC is solved using a quadratic penalty-based gradient projection algorithm. A numerical study on a toy network illustrates the effectiveness of the tolling strategy and reveals a Braess-type paradox in the context of traffic-derived emission.

preprint2012arXiv

A Link-based Mixed Integer LP Approach for Adaptive Traffic Signal Control

This paper is concerned with adaptive signal control problems on a road network, using a link-based kinematic wave model (Han et al., 2012). Such a model employs the Lighthill-Whitham-Richards model with a triangular fundamental diagram. A variational type argument (Lax, 1957; Newell, 1993) is applied so that the system dynamics can be determined without knowledge of the traffic state in the interior of each link. A Riemann problem for the signalized junction is explicitly solved; and an optimization problem is formulated in continuous-time with the aid of binary variables. A time-discretization turns the optimization problem into a mixed integer linear program (MILP). Unlike the cell-based approaches (Daganzo, 1995; Lin and Wang, 2004; Lo, 1999b), the proposed framework does not require modeling or computation within a link, thus reducing the number of (binary) variables and computational effort. The proposed model is free of vehicle-holding problems, and captures important features of signalized networks such as physical queue, spill back, vehicle turning, time-varying flow patterns and dynamic signal timing plans. The MILP can be efficiently solved with standard optimization software.

preprint2012arXiv

Competitive Robust Dynamic Pricing in Continuous Time with Fixed Inventories

The problem of robust dynamic pricing of an abstract commodity, whose inventory is specified at an initial time but never subsequently replenished, originally studied by Perakis and Sood (2006) in discrete time, is considered from the perspective of continuous time. We use a multiplicative demand function to model the uncertain demand, and develop a robust counterpart to replace the uncertain demand constraint. The sellers' robust best response problem yields a generalized Nash equilibrium problem, which can be formulated as an equivalent, continuous-time quasi-variational inequality. We demonstrate that, for appropriate regularity conditions, a generalized robust Nash equilibrium exists. We show that the quasi-variational inequality may be replaced by an equivalent variational inequality, and use a fixed-point algorithm to solve the variational inequality. We also demonstrate how explicit time lags associated with price updating in real-world decision environments, as well as specific pricing decision rules, may be introduced to create a dual time scale formulation and the associated solutions computed. We illustrate, via numerical examples, how robust pricing based on our DPFI formulation offers generally superior and never inferior worst case performance compared to nominal pricing.

preprint2012arXiv

Existence and Properties of the State Operator in Dynamic User Equilibrium

In this paper, we establish and prove analytical properties of the state operator embedded in an optimal control problem, in the context of dynamic user equilibrium (DUE) models (Friesz et al. 1993).

preprint2012arXiv

Lagrangian-based Hydrodynamic Model: Freeway Traffic Estimation

This paper is concerned with highway traffic estimation using traffic sensing data, in a Lagrangian-based modeling framework. We consider the Lighthill-Whitham-Richards (LWR) model (Lighthill and Whitham, 1955; Richards, 1956) in Lagrangian-coordinates, and provide rigorous mathematical results regarding the equivalence of viscosity solutions to the Hamilton-Jacobi equations in Eulerian and Lagrangian coordinates. We derive closed-form solutions to the Lagrangian-based Hamilton-Jacobi equation using the Lax-Hopf formula (Daganzo, 2005; Aubin et al., 2008), and discuss issues of fusing traffic data of various types into the Lagrangian-based H-J equation. A numerical study of the Mobile Century field experiment (Herrera et al., 2009) demonstrates the unique modeling features and insights provided by the Lagrangian-based approach.

preprint2012arXiv

Urban Freight Transportation Planning: A Dynamic Stackelberg Game-Theoretic Approach

In this paper we propose a dynamic Stackelberg game-theoretic model for urban freight transportation planning which is able to characterize the interaction between freight and personal transportation in an urban area. The problem is formulated as a bi-level dynamic mathematical program with equilibrium constraints (MPEC) which belongs to a class of computationally challenging problems. The lower level is dynamic user equilibrium (DUE) with inhomogeneous traffic that characterizes traffic system optimum (SO) freight transportation planning problem which aims at minimizing the total cost to a truck company. A mathematical program with complementarity constraints (MPCC) reformulation is derived and a projected gradient algorithm is designed to solve this computationally challenging problem. Numerical experiments are conducted to show that when planning freight transportation the background traffic is nonnegligible, even though the amount of trucks compared to other vehicles traveling on the same network is relatively small. What's more, in our proposed bi-level model for urban freight transportation planning, we find a dynamic case of a Braess-like Paradox which can provide managerial insights to a metropolitan planning organization (MPO) in increasing social welfare by restricting freight movement.

preprint2010arXiv

An Investigation of Hadronization Mechanism at $Z^{0}$ Factory

We briefly review the hadronization pictures adopted in the LUND String Fragmentation Model(LSFM), Webber Cluster Fragmentation Model(WCFM) and Quark Combination Model(QCM), respectively. Predictions of hadron multiplicity, baryon to meson ratios and baryon-antibaryon flavor correlations, especially related to heavy hadrons at $Z^0$ factory obtained by LSFM and QCM are reported.

preprint2010arXiv

Unitarity and Entropy Change in Exclusive Quark Combination Models

Entropy change in exclusive quark combination models is not an isolated problem. Contrary to adding and tuning some parameters to the relevant model(s) to fix the entropy, we show that it relates to the most general principles. Unitarity of the combination model is demonstrated to play the central rôle that guarantees the non-decrease of the entropy in the exclusive combination process.

Tao Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

Spectrum and rearrangement decays of tetraquark states with four different flavors

Exclusive Production Ratio of Neutral over Charged Kaon Pair in $e^+e^-$ Annihilation Continuum via `Straton Model'

A robust optimization approach for dynamic traffic signal control with emission considerations

A variational approach for continuous supply chain networks

Existence of simultaneous route and departure choice dynamic user equilibrium

Global solutions to folded concave penalized nonconvex learning

Network User Equilibrium with Elastic Demand: Formulation, Qualitative Analysis and Computation

On the Continuum Approximation of the On-and-off Signal Control on Dynamic Traffic Networks

Second-order models and traffic data from mobile sensors

Colour connections of four quark $Q\bar{Q}Q'\bar{Q}'$ system and doubly heavy baryon production in $e^{+}e^{-}$ annihilation

Search for doubly charmed hadron at B factory

Dynamic Congestion and Tolls with Mobile Source Emission

A Link-based Mixed Integer LP Approach for Adaptive Traffic Signal Control

Competitive Robust Dynamic Pricing in Continuous Time with Fixed Inventories

Existence and Properties of the State Operator in Dynamic User Equilibrium

Lagrangian-based Hydrodynamic Model: Freeway Traffic Estimation

Urban Freight Transportation Planning: A Dynamic Stackelberg Game-Theoretic Approach

An Investigation of Hadronization Mechanism at $Z^{0}$ Factory

Unitarity and Entropy Change in Exclusive Quark Combination Models