Source author record

Tianyu Wu

Tianyu Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Information Theory Machine Learning math.IT math.OC Artificial Intelligence Computation and Language Computational Engineering, Finance, and Science eess.AS eess.SP math.NA Molecular Networks Multiagent Systems Quantitative Methods Sound

Catalog footprint

What is connected

7works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during training. To address this, we derive per-position training weights from a differentiable surrogate of expected accepted draft length, matching the weight of each position to its log-probability gradient contribution. The resulting loss, D-PACE (Dynamic Position-Aware Cross-Entropy), shifts training signal toward positions that currently limit acceptance as the drafter improves. Across six benchmarks, two Qwen3-4B draft depths, two decoding temperatures, and two additional target models, D-PACE consistently improves both wall-clock speedup and average emitted length, with 2.3\% measured training-time overhead and no changes to the drafter architecture or inference procedure.

preprint2026arXiv

Markovian Promoter Models: A Mechanistic Alternative to Hill Functions in Gene Regulatory Networks

Gene regulatory networks are typically modeled using ordinary differential equations (ODEs) with phenomenological Hill functions to represent transcriptional regulation. While computationally efficient, Hill functions lack mechanistic grounding and cannot capture stochastic promoter dynamics. We present a hybrid Markovian-ODE framework that explicitly models discrete promoter states while maintaining computational tractability. Uniquely, we parameterize this model using fractional dwell times derived from ChEC-seq data, enabling the inference of in vivo kinetic rates from steady-state chromatin profiling. Our approach tracks individual transcription factor binding events as a continuous-time Markov chain, linked to deterministic molecular dynamics. We validate this framework on seven gene regulatory systems spanning basic to advanced complexity: the GAL system, repressilator, Goodwin oscillator, toggle switch, incoherent feed-forward loop, p53-Mdm2 oscillator, and NF-$κ$B pathway. Comparison with stochastic simulation algorithm (SSA) ground truth demonstrates that Markovian promoter models achieve similar accuracy to full stochastic simulations while being 10-100$\times$ faster. Our framework provides a mechanistic foundation for gene regulation modeling and enables investigation of promoter-level stochasticity in complex regulatory networks.

preprint2022arXiv

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL versus single-head multilingual SSL and supervised pre-training versus SSL. The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre-training strategies.

preprint2016arXiv

Coordinate Friendly Structures, Algorithms and Applications

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

preprint2016arXiv

Expander Graph and Communication-Efficient Decentralized Optimization

In this paper, we discuss how to design the graph topology to reduce the communication complexity of certain algorithms for decentralized optimization. Our goal is to minimize the total communication needed to achieve a prescribed accuracy. We discover that the so-called expander graphs are near-optimal choices. We propose three approaches to construct expander graphs for different numbers of nodes and node degrees. Our numerical results show that the performance of decentralized optimization is significantly better on expander graphs than other regular graphs.

preprint2010arXiv

Design and Analysis of Multi-User SDMA Systems with Noisy Limited CSIT Feedback

In this paper, we consider spatial-division multiple-access (SDMA) systems with one base station with multiple antennae and a number of single antenna mobiles under noisy limited CSIT feedback. We propose a robust noisy limited feedback design for SDMA systems. The solution consists of a real-time robust SDMA precoding, user selection and rate adaptation as well as an offline feedback index assignment algorithm. The index assignment problem is cast into a Traveling Sales Man problem (TSP). Based on the specific structure of the feedback constellation and the precoder, we derive a low complex but asymptotically optimal solution. Simulation results show that the proposed framework has significant goodput gain compared to the traditional naive designs under noisy limited feedback channel. Furthermore, we show that despite the noisy feedback channel, the average SDMA system goodput grows with the number of feedback bits in the interference limited regime while in noise limited regime increases linearly with the number of transmit antenna and the forward channel SNR.

preprint2009arXiv

A Low-Overhead Energy Detection Based Cooperative Sensing Protocol for Cognitive Radio Systems

Cognitive radio and dynamic spectrum access represent a new paradigm shift in more effective use of limited radio spectrum. One core component behind dynamic spectrum access is the sensing of primary user activity in the shared spectrum. Conventional distributed sensing and centralized decision framework involving multiple sensor nodes is proposed to enhance the sensing performance. However, it is difficult to apply the conventional schemes in reality since the overhead in sensing measurement and sensing reporting as well as in sensing report combining limit the number of sensor nodes that can participate in distributive sensing. In this paper, we shall propose a novel, low overhead and low complexity energy detection based cooperative sensing framework for the cognitive radio systems which addresses the above two issues. The energy detection based cooperative sensing scheme greatly reduces the quiet period overhead (for sensing measurement) as well as sensing reporting overhead of the secondary systems and the power scheduling algorithm dynamically allocate the transmission power of the cooperative sensor nodes based on the channel statistics of the links to the BS as well as the quality of the sensing measurement. In order to obtain design insights, we also derive the asymptotic sensing performance of the proposed cooperative sensing framework based on the mobility model. We show that the false alarm and mis-detection performance of the proposed cooperative sensing framework improve as we increase the number of cooperative sensor nodes.