Source author record

Yang Shi

Yang Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC nlin.SI Machine Learning cs.CY cond-mat.mes-hall Information Retrieval Software Engineering Artificial Intelligence cond-mat.str-el Hardware Architecture Information Theory math-ph math.IT math.MP physics.app-ph

Catalog footprint

What is connected

21works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs

Vector approximate nearest neighbor search (ANNS) underpins search engines, recommendation systems, and advertising services. Recent advances in ANNS indexes make CPU a cost-effective choice for serving million-scale, in-memory vector search, yet per-core throughput remains constrained by memory access latency of vector reading and the compute intensity of distance evaluations in production deployments. With the growing scale of the business and advances in hardware, modern CCD-based multi-core CPUs have been widely deployed for high throughput in our services. However, we find that simply increasing core counts does not yield optimal performance scaling. To improve the efficiency of more cores from the CCD-based architecture, we analyze the distributions of real-world requests in our production environments. We observe high access locality in vector search in our online services and low cache utilization, resulting from overlooking the multi-chiplet nature of CCD based CPUs. Hence, we propose a workload- and hardware-aware thread orchestration framework at CCD-level that (i) provides a uniform interface for both inter-query parallel HNSW search and intra-query parallel IVF search, (ii) achieves cache-friendly and workload-adaptive mapping of task dispatching, and (iii) employs CCD-aware task stealing to address load imbalance. Applied to real production workloads from search, recommendation, and advertising services of Xiaohongshu (RedNote), our approach delivers up to 3.7x higher throughput and 30-90% reductions in P50 and P999 latency. In detail, compared with the original framework, the cache-miss ratio decreases by 6-30%, and the total CPU stall is reduced by 20-80%.

preprint2022arXiv

Accelerated Dual Averaging Methods for Decentralized Constrained Optimization

In this work, we study decentralized convex constrained optimization problems in networks. We focus on the dual averaging-based algorithmic framework that is well-documented to be superior in handling constraints and complex communication environments simultaneously. Two new decentralized dual averaging (DDA) algorithms are proposed. In the first one, a second-order dynamic average consensus protocol is tailored for DDA-type algorithms, which equips each agent with a provably more accurate estimate of the global dual variable than conventional schemes. We rigorously prove that the proposed algorithm attains $\mathcal{O}(1/t)$ convergence for general convex and smooth problems, for which existing DDA methods were only known to converge at $\mathcal{O}(1/\sqrt{t})$ prior to our work. In the second one, we use the extrapolation technique to accelerate the convergence of DDA. Compared to existing accelerated algorithms, where typically two different variables are exchanged among agents at each time, the proposed algorithm only seeks consensus on local gradients. Then, the extrapolation is performed based on two sequences of primal variables which are determined by the accumulations of gradients at two consecutive time instants, respectively. The algorithm is proved to converge at $\mathcal{O}(1)\left(\frac{1}{t^2}+\frac{1}{t(1-β)^2}\right)$, where $β$ denotes the second largest singular value of the mixing matrix. We remark that the condition for the algorithmic parameter to guarantee convergence does not rely on the spectrum of the mixing matrix, making itself easy to satisfy in practice. Finally, numerical results are presented to demonstrate the efficiency of the proposed methods.

preprint2022arXiv

Achieving Model Fairness in Vertical Federated Learning

Vertical federated learning (VFL) has attracted greater and greater interest since it enables multiple parties possessing non-overlapping features to strengthen their machine learning models without disclosing their private data and model parameters. Similar to other machine learning algorithms, VFL faces demands and challenges of fairness, i.e., the learned model may be unfairly discriminatory over some groups with sensitive attributes. To tackle this problem, we propose a fair VFL framework in this work. First, we systematically formulate the problem of training fair models in VFL, where the learning task is modelled as a constrained optimization problem. To solve it in a federated and privacy-preserving manner, we consider the equivalent dual form of the problem and develop an asynchronous gradient coordinate-descent ascent algorithm, where some active data parties perform multiple parallelized local updates per communication round to effectively reduce the number of communication rounds. The messages that the server sends to passive parties are deliberately designed such that the information necessary for local updates is released without intruding on the privacy of data and sensitive attributes. We rigorously study the convergence of the algorithm when applied to general nonconvex-concave min-max problems. We prove that the algorithm finds a $δ$-stationary point of the dual objective in $\mathcal{O}(δ^{-4})$ communication rounds under mild conditions. Finally, the extensive experiments on three benchmark datasets demonstrate the superior performance of our method in training fair models.

preprint2022arXiv

Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks

Knowledge tracing (KT) models are a popular approach for predicting students' future performance at practice problems using their prior attempts. Though many innovations have been made in KT, most models including the state-of-the-art Deep KT (DKT) mainly leverage each student's response either as correct or incorrect, ignoring its content. In this work, we propose Code-based Deep Knowledge Tracing (Code-DKT), a model that uses an attention mechanism to automatically extract and select domain-specific code features to extend DKT. We compared the effectiveness of Code-DKT against Bayesian and Deep Knowledge Tracing (BKT and DKT) on a dataset from a class of 50 students attempting to solve 5 introductory programming assignments. Our results show that Code-DKT consistently outperforms DKT by 3.07-4.00% AUC across the 5 assignments, a comparable improvement to other state-of-the-art domain-general KT models over DKT. Finally, we analyze problem-specific performance through a set of case studies for one assignment to demonstrate when and how code features improve Code-DKT's predictions.

preprint2022arXiv

Direct visualization of percolating metal-insulator transition in V2O3 using scanning microwave impedance microscopy

Using the extensively studied V2O3 as a prototype system, we investigate the role of percolation in metal-insulator transition (MIT). We apply scanning microwave impedance microscopy to directly determine the metallic phase fraction p and relate it to the macroscopic conductance G, which shows a sudden jump when p reaches the percolation threshold. Interestingly, the conductance G exhibits a hysteretic behavior against p, suggesting two different percolating processes upon cooling and warming. Based on our image analysis and model simulation, we ascribe such hysteretic behavior to different domain nucleation and growth processes between cooling and warming, which is likely caused by the decoupled structural and electronic transitions in V2O3 during MIT. Our work provides a microscopic view of how the interplay of structural and electronic degrees of freedom affects MIT in strongly correlated systems.

preprint2022arXiv

Learning Similarity Preserving Binary Codes for Recommender Systems

Hashing-based Recommender Systems (RSs) are widely studied to provide scalable services. The existing methods for the systems combine three modules to achieve efficiency: feature extraction, interaction modeling, and binarization. In this paper, we study an unexplored module combination for the hashing-based recommender systems, namely Compact Cross-Similarity Recommender (CCSR). Inspired by cross-modal retrieval, CCSR utilizes Maximum a Posteriori similarity instead of matrix factorization and rating reconstruction to model interactions between users and items. We conducted experiments on MovieLens1M, Amazon product review, Ichiba purchase dataset and confirmed CCSR outperformed the existing matrix factorization-based methods. On the Movielens1M dataset, the absolute performance improvements are up to 15.69% in NDCG and 4.29% in Recall. In addition, we extensively studied three binarization modules: $sign$, scaled tanh, and sign-scaled tanh. The result demonstrated that although differentiable scaled tanh is popular in recent discrete feature learning literature, a huge performance drop occurs when outputs of scaled $tanh$ are forced to be binary.

preprint2021arXiv

Toward Semi-Automatic Misconception Discovery Using Code Embeddings

Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and how these misconceptions inform domain-specific insights into students' learning processes.

preprint2020arXiv

A unitary distributed subgradient method for multi-agent optimization with different coupling sources

In this work, we first consider distributed convex constrained optimization problems where the objective function is encoded by multiple local and possibly nonsmooth objectives privately held by a group of agents, and propose a distributed subgradient method with double averaging (abbreviated as ${\rm DSA_2}$) that only requires peer-to-peer communication and local computation to solve the global problem. The algorithmic framework builds on dual methods and dynamic average consensus; the sequence of test points is formed by iteratively minimizing a local dual model of the overall objective where the coefficients, i.e., approximated subgradients of the objective, are supplied by the dynamic average consensus scheme. We theoretically show that ${\rm DSA_2}$ enjoys non-ergodic convergence properties, i.e., the local minimizing sequence itself is convergent, a distinct feature that cannot be found in existing results. Specifically, we establish a convergence rate of $O(\frac{1}{\sqrt{t}})$ in terms of objective function error. Then, extensions are made to tackle distributed optimization problems with coupled functional constraints by combining ${\rm DSA_2}$ and dual decomposition. This is made possible by Lagrangian relaxation that transforms the coupling in constraints of the primal problem into that in cost functions of the dual, thus allowing us to solve the dual problem via ${\rm DSA_2}$. Both the dual objective error and the quadratic penalty for the coupled constraint are proved to converge at a rate of $O(\frac{1}{\sqrt{t}})$, and the primal objective error asymptotically vanishes. Numerical experiments and comparisons are conducted to illustrate the advantage of the proposed algorithms and validate our theoretical findings.

preprint2020arXiv

Integral-Type Event-Triggered Model Predictive Control of Nonlinear Systems with Additive Disturbance

This paper studies integral-type event-triggered model predictive control (MPC) of continuous-time nonlinear systems. An integral-type event-triggered mechanism is proposed by incorporating the integral of errors between the actual and predicted state sequences, leading to reduced average sampling frequency. Besides, a new and improved robustness constraint is introduced to handle the additive disturbance, rendering the MPC problem with a potentially enlarged initial feasible region. Furthermore, the feasibility of the designed MPC and the stability of the closed-loop system are rigorously investigated. Several sufficient conditions to guarantee these properties are established, which is related to factors such as the prediction horizon, the disturbance bound, the triggering level, and the contraction rate for the robustness constraint. The effectiveness of the proposed algorithm is illustrated by numerical examples and comparisons.

preprint2020arXiv

Resource-aware Exact Decentralized Optimization Using Event-triggered Broadcasting

This work addresses the decentralized optimization problem where a group of agents with coupled private objective functions work together to exactly optimize the summation of local interests. Upon modeling the decentralized problem as an equality-constrained centralized one, we leverage the linearized augmented Lagrangian method (LALM) to design an event-triggered decentralized algorithm that only requires light local computation at generic time instants and peer-to-peer communication at sporadic triggering time instants. The triggering time instants for each agent are locally determined by comparing the deviation between true and broadcast primal variables with certain triggering thresholds. Provided that the threshold is summable over time, we established a new upper bound for the effect of triggering behavior on the primal-dual residual. Based on this, the same convergence rate $O(\frac{1}{k})$ with periodic algorithms is secured for nonsmooth convex problems. Stronger convergence results have been further established for strongly convex and smooth problems, that is, the iterates linearly converge with exponentially decaying triggering thresholds.} We examine the developed strategy in two common optimization problems; comparison results illustrate its performance and superiority in exploiting communication resources.

preprint2020arXiv

Towards an $O(\frac{1}{t})$ convergence rate for distributed dual averaging

Recently, distributed dual averaging has received increasing attention due to its superiority in handling constraints and dynamic networks in multiagent optimization. However, all distributed dual averaging methods reported so far considered nonsmooth problems and have a convergence rate of $O(\frac{1}{\sqrt{t}})$. To achieve an improved convergence guarantee for smooth problems, this work proposes a second-order consensus scheme that assists each agent to locally track the global dual variable more accurately. This new scheme in conjunction with smoothness of the objective ensures that the accumulation of consensus error over time caused by incomplete global information is bounded from above. Then, a rigorous investigation of dual averaging with inexact gradient oracles is carried out to compensate the consensus error and achieve an $O(\frac{1}{t})$ convergence rate. The proposed method is examined in a large-scale LASSO problem.

preprint2020arXiv

Transition between Thin Film Boiling and Evaporation on Nanoporous Membranes Near the Kinetic Limit

Nanoporous structures including single nanopores and nanoporous membranes have been utilized as a platform to study fundamental liquid-vapor phase change heat transfer (PCHT) processes as well as a promising candidate for high flux heat dissipation. Both thin film boiling and evaporation through nanoporous structures have been demonstrated to achieve high heat flux, but they are usually considered two mutually exclusive regimes operated under vastly different conditions, and the factors dictating how close the PCHT process is to the kinetic limit are elusive. In this work, we utilized a unique transition between thin film boiling and evaporation through nanoporous membranes to clarify the factors determining the heat flux and heat transfer coefficient (HTC) with respect to the kinetic limit conditions. We unambiguously showed the controllable transition from boiling to evaporation, when the liquid receded into the nanopores and provided additional driving force from capillary pumping sustained in the nanoscale pores. We showed that this transition is universal and can be understood from a simple fluid transport model for all the four types of fluids we studied, which cover a wide span of surface tension (water, ethanol, isopropanol (IPA), FC-72). More importantly, PCHT conditions at the transition points between boiling and evaporation were close to those of the kinetic limit of all these fluids. However, further increase of the heat flux beyond the transition points led to decreasing HTC and deviation from the kinetic limit, which can be attributed to the increasing vapor resistance in the vapor space and inside the nanopores. This increasing vapor resistance was also confirmed by experiments on IPA with different vapor pressures.

preprint2017arXiv

Decentralized Charging Control of Electric Vehicles in Residential Distribution Networks

Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve "valley-filling" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.

preprint2016arXiv

Lattice equations arising from discrete Painlevé systems. II. $A_4^{(1)}$ case

In this paper, we construct two lattices from the $τ$ functions of $A_4^{(1)}$-surface $q$-Painlevé equations, on which quad-equations of ABS type appear. Moreover, using the reduced hypercube structure, we obtain the Lax pairs of the $A_4^{(1)}$-surface $q$-Painlevé equations.

preprint2016arXiv

Receding Horizon Consensus of General Linear Multi-agent Systems with Input Constraints: An Inverse Optimality Approach

It is desirable but challenging to fulfill system constraints and reach optimal performance in consensus protocol design for practical multi-agent systems (MASs). This paper investigates the optimal consensus problem for general linear MASs subject to control input constraints. Two classes of MASs including subsystems with semi-stable and unstable dynamics are considered. For both classes of MASs without input constraints, the results on designing optimal consensus protocols are first developed by inverse optimality approach. Utilizing the optimal consensus protocols, the receding horizon control (RHC)-based consensus strategies are designed for these two classes of MASs with input constraints. The conditions for assigning the cost functions distributively are derived, based on which the distributed RHC-based consensus frameworks are formulated. Next, the feasibility and consensus properties of the closed-loop systems are analyzed. It is shown that 1) the optimal performance indices under the inverse optimal consensus protocols are coupled with the network topologies and the system matrices of subsystems, but they are different for MASs with semi-stable and unstable subsystems; 2) the unstable modes of subsystems impose more stringent requirements for the parameter design; 3) the designed RHC-based consensus strategies can make the control input constraints fulfilled and ensure consensus for the closed-loop systems in both cases. But for MASs with semi-stable subsystems, the {\em convergent consensus} can be reached. Finally, two examples are provided to verify the effectiveness of the proposed results.

preprint2016arXiv

Reflection groups and discrete integrable systems

We present a method of constructing discrete integrable systems with crystallographic reflection group (Weyl) symmetries, thus clarifying the relationship between different discrete integrable systems in terms of their symmetry groups. Discrete integrable systems are associated with space-filling polytopes arise from the geometric representation of the Weyl groups in the $n$-dimensional real Euclidean space $\mathbb{R}^n$. The "multi-dimensional consistency" property of the discrete integrable system is shown to be inherited from the combinatorial properties of the polytope; while the dynamics of the system is described by the affine translations of the polytopes on the weight lattices of the Weyl groups. The connections between some well-known discrete systems such as the multi-dimensional consistent systems of quad-equations \cite{abs:03} and discrete Painlevé equations \cite{sak:01} are obtained via the geometric constraints that relate the polytope of one symmetry group to that of another symmetry group, a procedure which we call geometric reduction.

preprint2016arXiv

Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoherence conditions on the low rank component, and bounded level of sparse perturbations. We compare our method with natural baselines which apply robust matrix PCA either to the {\em flattened} tensor, or to the matrix slices of the tensor. Our method can provably handle a far greater level of perturbation when the sparse tensor is block-structured. This naturally occurs in many applications such as the activity detection task in videos. Our experiments validate these findings. Thus, we establish that tensor methods can tolerate a higher level of gross corruptions compared to matrix methods.

preprint2015arXiv

A systematic approach to reductions of type-Q ABS equations

We present a class of reductions of Möbius type for the lattice equations known as Q1, Q2, and Q3 from the ABS list. The deautonomised form of one particular reduction of Q3 is shown to exist on the $A_1^{(1)}$ surface which belongs to the multiplicative type of rational surfaces in Sakai's classification of Painlevé systems. Using the growth of degrees of iterates, all other mappings that result from the class of reductions considered here are shown to be linearisable. Any possible linearisations are calculated explicitly by constructing a birational transformation defined by invariant curves in the blown up space of initial values for each reduction.

preprint2015arXiv

Lattice equations arising from discrete Painlevé systems (I): $(A_2+A_1)^{(1)}$ and $(A_1+A_1')^{(1)}$ cases

We introduce the concept of $ω$-lattice, constructed from $τ$ functions of Painlevé systems, on which quad-equations of ABS type appear. In particular, we consider the $A_5^{(1)}$- and $A_6^{(1)}$-surface $q$-Painlevé systems corresponding affine Weyl group symmetries are of $(A_2+A_1)^{(1)}$- and $(A_1+A_1)^{(1)}$-types, respectively.

preprint2014arXiv

Geometric Reductions of ABS equations on an $n$-cube to discrete Painlevé systems

In this paper, we show how to relate $n$-dimensional cubes on which ABS equations hold to the symmetry groups of discrete Painlevé equations. We here focus on the reduction from the 4-dimensional cube to the $q$-discrete third Painlevé equation, which is a dynamical system on a rational surface of type $A_5^{(1)}$ with the extended affine Weyl group $\widetilde{\mathcal W}\bigl((A_2+A_1)^{(1)}\bigr)$. We provide general theorems to show that this reduction also extends to other discrete Painlevé equations at least of type A.

preprint2011arXiv

Reversible arithmetic logic unit

Quantum computer requires quantum arithmetic. The sophisticated design of a reversible arithmetic logic unit (reversible ALU) for quantum arithmetic has been investigated in this letter. We provide explicit construction of reversible ALU effecting basic arithmetic operations. By provided the corresponding control unit, the proposed reversible ALU can combine the classical arithmetic and logic operation in a reversible integrated system. This letter provides actual evidence to prove the possibility of the realization of reversible Programmable Logic Device (RPLD) using reversible ALU.

Yang Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs

Accelerated Dual Averaging Methods for Decentralized Constrained Optimization

Achieving Model Fairness in Vertical Federated Learning

Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks

Direct visualization of percolating metal-insulator transition in V2O3 using scanning microwave impedance microscopy

Learning Similarity Preserving Binary Codes for Recommender Systems

Toward Semi-Automatic Misconception Discovery Using Code Embeddings

A unitary distributed subgradient method for multi-agent optimization with different coupling sources

Integral-Type Event-Triggered Model Predictive Control of Nonlinear Systems with Additive Disturbance

Resource-aware Exact Decentralized Optimization Using Event-triggered Broadcasting

Towards an $O(\frac{1}{t})$ convergence rate for distributed dual averaging

Transition between Thin Film Boiling and Evaporation on Nanoporous Membranes Near the Kinetic Limit

Decentralized Charging Control of Electric Vehicles in Residential Distribution Networks

Lattice equations arising from discrete Painlevé systems. II. $A_4^{(1)}$ case

Receding Horizon Consensus of General Linear Multi-agent Systems with Input Constraints: An Inverse Optimality Approach

Reflection groups and discrete integrable systems

Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

A systematic approach to reductions of type-Q ABS equations

Lattice equations arising from discrete Painlevé systems (I): $(A_2+A_1)^{(1)}$ and $(A_1+A_1')^{(1)}$ cases

Geometric Reductions of ABS equations on an $n$-cube to discrete Painlevé systems

Reversible arithmetic logic unit