Source author record

Yao Tang

Yao Tang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Artificial Intelligence Computation and Language eess.SP Machine Learning

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.

preprint2026arXiv

The Velocity Deficit: Initial Energy Injection for Flow Matching

While Flow Matching theoretically guarantees constant-velocity trajectories, we identify a critical breakdown in high-dimensional practice: the Velocity Deficit. We show that the MSE objective systematically underestimates velocity magnitude, causing generated samples to fail to reach the data manifold-a phenomenon we term Integration Lag. To rectify this, we propose Initial Energy Injection, instantiated via two complementary methods: the training-based Magnitude-Aware Flow Matching (MAFM) and the training-free Scale Schedule Corrector (SSC). Both are grounded in our discovery of a crucial asymmetry: velocity contraction causes harmful kinetic stagnation at the trajectory's start, yet acts as a beneficial denoising mechanism at its end. Empirically, SSC yields significant efficiency gains with zero retraining and just one line of code. On ImageNet-1k (256x256), it improves FID by 44.6% (from 13.68 to 7.58) and achieves a 5x speedup, enabling a 50-step generator (FID 7.58) to beat a 250-step baseline (FID 8.65). Furthermore, our methods generalize to Text-to-Image tasks and high-resolution generation, improving FID on MS-COCO by ~22%.

preprint2022arXiv

Efficient One Pass Self-distillation with Zipf's Label Smoothing

Self-distillation exploits non-uniform soft supervision from itself during training and improves performance without any runtime cost. However, the overhead during training is often overlooked, and yet reducing time and memory overhead during training is increasingly important in the giant models' era. This paper proposes an efficient self-distillation method named Zipf's Label Smoothing (Zipf's LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters. Our idea comes from an empirical observation that when the network is duly trained the output values of a network's final softmax layer, after sorting by the magnitude and averaged across samples, should follow a distribution reminiscent to Zipf's Law in the word frequency statistics of natural languages. By enforcing this property on the sample level and throughout the whole training period, we find that the prediction accuracy can be greatly improved. Using ResNet50 on the INAT21 fine-grained classification dataset, our technique achieves +3.61% accuracy gain compared to the vanilla baseline, and 0.88% more gain against the previous label smoothing or self-distillation strategies. The implementation is publicly available at https://github.com/megvii-research/zipfls.

preprint2019arXiv

Trajectory Design for Multiple-UAV Assisted Wireless Networks

Unmanned aerial vehicles (UAVs) can enhance the performance of cellular networks, due to their high mobility and efficient deployment. In this paper, we present a first study on how the user mobility affects the UAVs' trajectories of a multiple-UAV assisted wireless communication system. Specifically, we consider the UAVs are deployed as aerial base stations to serve ground users who move between different regions. We maximize the throughput of ground users in the downlink communication by optimizing the UAVs' trajectories, while taking into account the impact of the user mobility, propulsion energy consumption, and UAVs' mutual interference. We formulate the problem as a route selection problem in an acyclic directed graph. Each vertex represents a task associated with a reward on the average user throughput in a region-time point, while each edge is associated with a cost on the energy propulsion consumption during flying and hovering. For the centralized trajectory design, we first propose the shortest path scheme that determines the optimal trajectory for the single UAV case. We also propose the centralized route selection (CRS) scheme to systematically compute the optimal trajectories for the more general multiple-UAV case. Due to the NP-hardness of the centralized problem, we consider the distributed trajectory design that each UAV selects its trajectory autonomously and propose the distributed route selection (DRS) scheme, which will converge to a pure strategy Nash equilibrium within a finite number of iterations.

preprint2013arXiv

A Partial Decode-Forward Scheme For A Network with N relays

We study a discrete-memoryless relay network consisting of one source, one destination and N relays, and design a scheme based on partial decode-forward relaying. The source splits its message into one common and N+1 private parts, one intended for each relay. It encodes these message parts using Nth-order block Markov coding, in which each private message part is independently superimposed on the common parts of the current and N previous blocks. Using simultaneous sliding window decoding, each relay fully recovers the common message and its intended private message with the same block index, then forwards them to the following nodes in the next block. This scheme can be applied to any network topology. We derive its achievable rate in a compact form. The result reduces to a known decode-forward lower bound for an N-relay network and partial decode-forward lower bound for a two-level relay network. We then apply the scheme to a Gaussian two-level relay network and obtain its capacity lower bound considering power constraints at the transmitting nodes.