Researcher profile

Yang Shi

Yang Shi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs

Vector approximate nearest neighbor search (ANNS) underpins search engines, recommendation systems, and advertising services. Recent advances in ANNS indexes make CPU a cost-effective choice for serving million-scale, in-memory vector search, yet per-core throughput remains constrained by memory access latency of vector reading and the compute intensity of distance evaluations in production deployments. With the growing scale of the business and advances in hardware, modern CCD-based multi-core CPUs have been widely deployed for high throughput in our services. However, we find that simply increasing core counts does not yield optimal performance scaling. To improve the efficiency of more cores from the CCD-based architecture, we analyze the distributions of real-world requests in our production environments. We observe high access locality in vector search in our online services and low cache utilization, resulting from overlooking the multi-chiplet nature of CCD based CPUs. Hence, we propose a workload- and hardware-aware thread orchestration framework at CCD-level that (i) provides a uniform interface for both inter-query parallel HNSW search and intra-query parallel IVF search, (ii) achieves cache-friendly and workload-adaptive mapping of task dispatching, and (iii) employs CCD-aware task stealing to address load imbalance. Applied to real production workloads from search, recommendation, and advertising services of Xiaohongshu (RedNote), our approach delivers up to 3.7x higher throughput and 30-90% reductions in P50 and P999 latency. In detail, compared with the original framework, the cache-miss ratio decreases by 6-30%, and the total CPU stall is reduced by 20-80%.

preprint2022arXiv

Accelerated Dual Averaging Methods for Decentralized Constrained Optimization

In this work, we study decentralized convex constrained optimization problems in networks. We focus on the dual averaging-based algorithmic framework that is well-documented to be superior in handling constraints and complex communication environments simultaneously. Two new decentralized dual averaging (DDA) algorithms are proposed. In the first one, a second-order dynamic average consensus protocol is tailored for DDA-type algorithms, which equips each agent with a provably more accurate estimate of the global dual variable than conventional schemes. We rigorously prove that the proposed algorithm attains $\mathcal{O}(1/t)$ convergence for general convex and smooth problems, for which existing DDA methods were only known to converge at $\mathcal{O}(1/\sqrt{t})$ prior to our work. In the second one, we use the extrapolation technique to accelerate the convergence of DDA. Compared to existing accelerated algorithms, where typically two different variables are exchanged among agents at each time, the proposed algorithm only seeks consensus on local gradients. Then, the extrapolation is performed based on two sequences of primal variables which are determined by the accumulations of gradients at two consecutive time instants, respectively. The algorithm is proved to converge at $\mathcal{O}(1)\left(\frac{1}{t^2}+\frac{1}{t(1-β)^2}\right)$, where $β$ denotes the second largest singular value of the mixing matrix. We remark that the condition for the algorithmic parameter to guarantee convergence does not rely on the spectrum of the mixing matrix, making itself easy to satisfy in practice. Finally, numerical results are presented to demonstrate the efficiency of the proposed methods.

preprint2022arXiv

Achieving Model Fairness in Vertical Federated Learning

Vertical federated learning (VFL) has attracted greater and greater interest since it enables multiple parties possessing non-overlapping features to strengthen their machine learning models without disclosing their private data and model parameters. Similar to other machine learning algorithms, VFL faces demands and challenges of fairness, i.e., the learned model may be unfairly discriminatory over some groups with sensitive attributes. To tackle this problem, we propose a fair VFL framework in this work. First, we systematically formulate the problem of training fair models in VFL, where the learning task is modelled as a constrained optimization problem. To solve it in a federated and privacy-preserving manner, we consider the equivalent dual form of the problem and develop an asynchronous gradient coordinate-descent ascent algorithm, where some active data parties perform multiple parallelized local updates per communication round to effectively reduce the number of communication rounds. The messages that the server sends to passive parties are deliberately designed such that the information necessary for local updates is released without intruding on the privacy of data and sensitive attributes. We rigorously study the convergence of the algorithm when applied to general nonconvex-concave min-max problems. We prove that the algorithm finds a $δ$-stationary point of the dual objective in $\mathcal{O}(δ^{-4})$ communication rounds under mild conditions. Finally, the extensive experiments on three benchmark datasets demonstrate the superior performance of our method in training fair models.

preprint2022arXiv

Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks

Knowledge tracing (KT) models are a popular approach for predicting students' future performance at practice problems using their prior attempts. Though many innovations have been made in KT, most models including the state-of-the-art Deep KT (DKT) mainly leverage each student's response either as correct or incorrect, ignoring its content. In this work, we propose Code-based Deep Knowledge Tracing (Code-DKT), a model that uses an attention mechanism to automatically extract and select domain-specific code features to extend DKT. We compared the effectiveness of Code-DKT against Bayesian and Deep Knowledge Tracing (BKT and DKT) on a dataset from a class of 50 students attempting to solve 5 introductory programming assignments. Our results show that Code-DKT consistently outperforms DKT by 3.07-4.00% AUC across the 5 assignments, a comparable improvement to other state-of-the-art domain-general KT models over DKT. Finally, we analyze problem-specific performance through a set of case studies for one assignment to demonstrate when and how code features improve Code-DKT's predictions.

preprint2022arXiv

Direct visualization of percolating metal-insulator transition in V2O3 using scanning microwave impedance microscopy

Using the extensively studied V2O3 as a prototype system, we investigate the role of percolation in metal-insulator transition (MIT). We apply scanning microwave impedance microscopy to directly determine the metallic phase fraction p and relate it to the macroscopic conductance G, which shows a sudden jump when p reaches the percolation threshold. Interestingly, the conductance G exhibits a hysteretic behavior against p, suggesting two different percolating processes upon cooling and warming. Based on our image analysis and model simulation, we ascribe such hysteretic behavior to different domain nucleation and growth processes between cooling and warming, which is likely caused by the decoupled structural and electronic transitions in V2O3 during MIT. Our work provides a microscopic view of how the interplay of structural and electronic degrees of freedom affects MIT in strongly correlated systems.

preprint2022arXiv

Learning Similarity Preserving Binary Codes for Recommender Systems

Hashing-based Recommender Systems (RSs) are widely studied to provide scalable services. The existing methods for the systems combine three modules to achieve efficiency: feature extraction, interaction modeling, and binarization. In this paper, we study an unexplored module combination for the hashing-based recommender systems, namely Compact Cross-Similarity Recommender (CCSR). Inspired by cross-modal retrieval, CCSR utilizes Maximum a Posteriori similarity instead of matrix factorization and rating reconstruction to model interactions between users and items. We conducted experiments on MovieLens1M, Amazon product review, Ichiba purchase dataset and confirmed CCSR outperformed the existing matrix factorization-based methods. On the Movielens1M dataset, the absolute performance improvements are up to 15.69% in NDCG and 4.29% in Recall. In addition, we extensively studied three binarization modules: $sign$, scaled tanh, and sign-scaled tanh. The result demonstrated that although differentiable scaled tanh is popular in recent discrete feature learning literature, a huge performance drop occurs when outputs of scaled $tanh$ are forced to be binary.

preprint2021arXiv

Toward Semi-Automatic Misconception Discovery Using Code Embeddings

Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and how these misconceptions inform domain-specific insights into students' learning processes.

preprint2020arXiv

A unitary distributed subgradient method for multi-agent optimization with different coupling sources

In this work, we first consider distributed convex constrained optimization problems where the objective function is encoded by multiple local and possibly nonsmooth objectives privately held by a group of agents, and propose a distributed subgradient method with double averaging (abbreviated as ${\rm DSA_2}$) that only requires peer-to-peer communication and local computation to solve the global problem. The algorithmic framework builds on dual methods and dynamic average consensus; the sequence of test points is formed by iteratively minimizing a local dual model of the overall objective where the coefficients, i.e., approximated subgradients of the objective, are supplied by the dynamic average consensus scheme. We theoretically show that ${\rm DSA_2}$ enjoys non-ergodic convergence properties, i.e., the local minimizing sequence itself is convergent, a distinct feature that cannot be found in existing results. Specifically, we establish a convergence rate of $O(\frac{1}{\sqrt{t}})$ in terms of objective function error. Then, extensions are made to tackle distributed optimization problems with coupled functional constraints by combining ${\rm DSA_2}$ and dual decomposition. This is made possible by Lagrangian relaxation that transforms the coupling in constraints of the primal problem into that in cost functions of the dual, thus allowing us to solve the dual problem via ${\rm DSA_2}$. Both the dual objective error and the quadratic penalty for the coupled constraint are proved to converge at a rate of $O(\frac{1}{\sqrt{t}})$, and the primal objective error asymptotically vanishes. Numerical experiments and comparisons are conducted to illustrate the advantage of the proposed algorithms and validate our theoretical findings.

preprint2020arXiv

Integral-Type Event-Triggered Model Predictive Control of Nonlinear Systems with Additive Disturbance

This paper studies integral-type event-triggered model predictive control (MPC) of continuous-time nonlinear systems. An integral-type event-triggered mechanism is proposed by incorporating the integral of errors between the actual and predicted state sequences, leading to reduced average sampling frequency. Besides, a new and improved robustness constraint is introduced to handle the additive disturbance, rendering the MPC problem with a potentially enlarged initial feasible region. Furthermore, the feasibility of the designed MPC and the stability of the closed-loop system are rigorously investigated. Several sufficient conditions to guarantee these properties are established, which is related to factors such as the prediction horizon, the disturbance bound, the triggering level, and the contraction rate for the robustness constraint. The effectiveness of the proposed algorithm is illustrated by numerical examples and comparisons.

preprint2020arXiv

Resource-aware Exact Decentralized Optimization Using Event-triggered Broadcasting

This work addresses the decentralized optimization problem where a group of agents with coupled private objective functions work together to exactly optimize the summation of local interests. Upon modeling the decentralized problem as an equality-constrained centralized one, we leverage the linearized augmented Lagrangian method (LALM) to design an event-triggered decentralized algorithm that only requires light local computation at generic time instants and peer-to-peer communication at sporadic triggering time instants. The triggering time instants for each agent are locally determined by comparing the deviation between true and broadcast primal variables with certain triggering thresholds. Provided that the threshold is summable over time, we established a new upper bound for the effect of triggering behavior on the primal-dual residual. Based on this, the same convergence rate $O(\frac{1}{k})$ with periodic algorithms is secured for nonsmooth convex problems. Stronger convergence results have been further established for strongly convex and smooth problems, that is, the iterates linearly converge with exponentially decaying triggering thresholds.} We examine the developed strategy in two common optimization problems; comparison results illustrate its performance and superiority in exploiting communication resources.

preprint2020arXiv

Towards an $O(\frac{1}{t})$ convergence rate for distributed dual averaging

Recently, distributed dual averaging has received increasing attention due to its superiority in handling constraints and dynamic networks in multiagent optimization. However, all distributed dual averaging methods reported so far considered nonsmooth problems and have a convergence rate of $O(\frac{1}{\sqrt{t}})$. To achieve an improved convergence guarantee for smooth problems, this work proposes a second-order consensus scheme that assists each agent to locally track the global dual variable more accurately. This new scheme in conjunction with smoothness of the objective ensures that the accumulation of consensus error over time caused by incomplete global information is bounded from above. Then, a rigorous investigation of dual averaging with inexact gradient oracles is carried out to compensate the consensus error and achieve an $O(\frac{1}{t})$ convergence rate. The proposed method is examined in a large-scale LASSO problem.

preprint2020arXiv

Transition between Thin Film Boiling and Evaporation on Nanoporous Membranes Near the Kinetic Limit

Nanoporous structures including single nanopores and nanoporous membranes have been utilized as a platform to study fundamental liquid-vapor phase change heat transfer (PCHT) processes as well as a promising candidate for high flux heat dissipation. Both thin film boiling and evaporation through nanoporous structures have been demonstrated to achieve high heat flux, but they are usually considered two mutually exclusive regimes operated under vastly different conditions, and the factors dictating how close the PCHT process is to the kinetic limit are elusive. In this work, we utilized a unique transition between thin film boiling and evaporation through nanoporous membranes to clarify the factors determining the heat flux and heat transfer coefficient (HTC) with respect to the kinetic limit conditions. We unambiguously showed the controllable transition from boiling to evaporation, when the liquid receded into the nanopores and provided additional driving force from capillary pumping sustained in the nanoscale pores. We showed that this transition is universal and can be understood from a simple fluid transport model for all the four types of fluids we studied, which cover a wide span of surface tension (water, ethanol, isopropanol (IPA), FC-72). More importantly, PCHT conditions at the transition points between boiling and evaporation were close to those of the kinetic limit of all these fluids. However, further increase of the heat flux beyond the transition points led to decreasing HTC and deviation from the kinetic limit, which can be attributed to the increasing vapor resistance in the vapor space and inside the nanopores. This increasing vapor resistance was also confirmed by experiments on IPA with different vapor pressures.

preprint2017arXiv

Decentralized Charging Control of Electric Vehicles in Residential Distribution Networks

Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve "valley-filling" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.