Source author record

Xiaofeng Cao

Xiaofeng Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph Computation and Language Machine Learning Networking and Internet Architecture

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We attribute this dilemma to a train-inference mismatch amplified by irreversible decoding. While training reconstructs tokens from randomly corrupted states, efficient inference requires an adaptive denoising order, where easier tokens are revealed earlier and context-dependent ones are deferred. This view motivates two complementary methods: an inference-time method that makes parallel decoding revokable, and a training-time extension that distills the reliable order exposed by this revokable process. Accordingly, we first propose Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable parallel generation. WINO aggressively drafts multiple tokens, verifies generated tokens with enriched global context, and re-masks unreliable ones for later refinement. Building on this discovered order, we further introduce WINO+, which injects the verified denoising trajectories produced by WINO into model parameters, aligning training with efficient inference. Experiments on LLaDA and MMaDA show that WINO improves both quality and efficiency, while WINO+ further strengthens this progression. On GSM8K, WINO improves accuracy from 73.24% to 75.82% with a 6.10x step reduction, and WINO+ further achieves 76.58% with a 6.83x reduction. On Flickr30K, WINO+ reaches a 16.22x step reduction with improved CIDEr. These results demonstrate that DLLMs can serve as their own efficiency teachers by first discovering reliable denoising orders through revokable decoding and then learning to follow them for faster generation. Code is available at https://github.com/Feng-Hong/WINO-DLLM/tree/WINO-plus.

preprint2022arXiv

Data-Efficient Learning via Minimizing Hyperspherical Energy

Deep learning on large-scale data is dominant nowadays. The unprecedented scale of data has been arguably one of the most important driving forces for the success of deep learning. However, there still exist scenarios where collecting data or labels could be extremely expensive, e.g., medical imaging and robotics. To fill up this gap, this paper considers the problem of data-efficient learning from scratch using a small amount of representative data. First, we characterize this problem by active learning on homeomorphic tubes of spherical manifolds. This naturally generates feasible hypothesis class. With homologous topological properties, we identify an important connection -- finding tube manifolds is equivalent to minimizing hyperspherical energy (MHE) in physical geometry. Inspired by this connection, we propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL, covering convergence and generalization analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide range of applications on data-efficient learning, including deep clustering, distribution matching, version space sampling and deep active learning.

preprint2020arXiv

Edge Federation: Towards an Integrated Service Provisioning Model

Edge computing is a promising computing paradigm for pushing the cloud service to the network edge. To this end, edge infrastructure providers (EIPs) need to bring computation and storage resources to the network edge and allow edge service providers (ESPs) to provision latency-critical services to users. Currently, EIPs prefer to establish a series of private edge-computing environments to serve specific requirements of users. This kind of resource provisioning mechanism severely limits the development and spread of edge computing for serving diverse user requirements. To this end, we propose an integrated resource provisioning model, named edge federation, to seamlessly realize the resource cooperation and service provisioning across standalone edge computing providers and clouds. To efficiently schedule and utilize the resources across multiple EIPs, we systematically characterize the provisioning process as a large-scale linear programming (LP) problem and transform it into an easily solved form. Accordingly, we design a dynamic algorithm to tackle the varying service demands from users. We conduct extensive experiments over the base station networks in Toronto city. Compared with the existing fixed contract model and multihoming model, edge federation can reduce the overall cost of EIPs by 23.3% to 24.5%, and 15.5% to 16.3%, respectively.

preprint2008arXiv

Frequency Variation of the Kilohertz Quasi-periodic Oscillations and the Flux of the Band-limited Noise in Scorpius X-1

We study the kilohertz quasi-periodic oscillations (kHz QPOs) and the band-limited noise (BLN) in the 0.5--16 Hz range observed simultaneously on the horizontal branch (HB) and on the upper normal branch (NB) of the brightest neutron star Low-mass X-ray Binary (LMXB) Scorpius X--1 with the observations performed with the {\it Rossi X-Ray Timing Explorer (RXTE)}. We find that the twin kHz QPO frequencies are positively correlated with the flux variations taking place on the BLN time scales on the HB, in contrast to the anti-correlation held on the time scale of the normal branch oscillation (NBO) on the NB reported previously, suggesting that although they occur in sequence along the color-color tracks, the BLN and the NBO are of different origins. We also show the evidence that the frequency separation between the twin kHz QPOs decreases with the flux by $2\sim~3$ Hz on the BLN time scales, which is consistent with the trend on the longer time scale that the Z source traces the HB. This further suggests that the flux variation associated with the BLN originates from the mass accretion rate variation in the disk accretion flow. We discuss the implications of these results for our understanding of the BLN.