Researcher profile

Yu Yao

Yu Yao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during training. To address this, we derive per-position training weights from a differentiable surrogate of expected accepted draft length, matching the weight of each position to its log-probability gradient contribution. The resulting loss, D-PACE (Dynamic Position-Aware Cross-Entropy), shifts training signal toward positions that currently limit acceptance as the drafter improves. Across six benchmarks, two Qwen3-4B draft depths, two decoding temperatures, and two additional target models, D-PACE consistently improves both wall-clock speedup and average emitted length, with 2.3\% measured training-time overhead and no changes to the drafter architecture or inference procedure.

preprint2024arXiv

Rethinking the Paradigm of Content Constraints in Unpaired Image-to-Image Translation

In an unpaired setting, lacking sufficient content constraints for image-to-image translation (I2I) tasks, GAN-based approaches are usually prone to model collapse. Current solutions can be divided into two categories, reconstruction-based and Siamese network-based. The former requires that the transformed or transforming image can be perfectly converted back to the original image, which is sometimes too strict and limits the generative performance. The latter involves feeding the original and generated images into a feature extractor and then matching their outputs. This is not efficient enough, and a universal feature extractor is not easily available. In this paper, we propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features from the same stage of the \textbf{En}coder and de\textbf{Co}der of the generator. For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks. Benefits from the design, EnCo training is extremely efficient, while the features from the encoder produce a more positive effect on the decoding, leading to more satisfying generations. In addition, we rethink the role played by discriminators in sampling patches and propose a discriminative attention-guided (DAG) patch sampling strategy to replace random sampling. DAG is parameter-free and only requires negligible computational overhead, while significantly improving the performance of the model. Extensive experiments on multiple datasets demonstrate the effectiveness and advantages of EnCo, and we achieve multiple state-of-the-art compared to previous methods. Our code is available at https://github.com/XiudingCai/EnCo-pytorch.

preprint2023arXiv

New Binary Quantum Codes Constructed from Quasi-Cyclic Codes

It is well known that quantum codes can be constructed by means of classical symplectic dual-containing codes. This paper considers a family of two-generator quasi-cyclic codes and derives sufficient conditions for these codes to be symplectic dual-containing. Then, a new method for constructing binary quantum codes using symplectic dual-containing codes is proposed. As an application, we construct 8 binary quantum codes that exceed the best-known results. Further, another 36 new binary quantum codes are obtained by propagation rules, all of which improve the lower bound on the minimum distances.

preprint2022arXiv

A Many-ported and Shared Memory Architecture for High-Performance ADAS SoCs

Increasing investment in computing technologies and the advancements in silicon technology has fueled rapid growth in advanced driver assistance systems (ADAS) and corresponding SoC developments. An ADAS SoC represents a heterogeneous architecture that consists of CPUs, GPUs and artificial intelligence (AI) accelerators. In order to guarantee its safety and reliability, it must process massive amount of raw data collected from multiple redundant sources such as high-definition video cameras, Radars, and Lidars to recognize objects correctly and to make the right decisions promptly. A domain specific memory architecture is essential to achieve the above goals. We present a shared memory architecture that enables high data throughput among multiple parallel accesses native to the ADAS applications. It also provides deterministic access latency with proper isolation under the stringent real-time QoS constraints. A prototype is built and analyzed. The results validate that the proposed architecture provides close to 100\% throughput for both read and write accesses generated simultaneously by many accessing masters with full injection rate. It can also provide consistent QoS to the domain specific payloads while enabling the scalability and modularity of the design.

preprint2022arXiv

Do We Need to Penalize Variance of Losses for Learning with Label Noise?

Algorithms which minimize the averaged loss have been widely designed for dealing with noisy labels. Intuitively, when there is a finite training sample, penalizing the variance of losses will improve the stability and generalization of the algorithms. Interestingly, we found that the variance should be increased for the problem of learning with noisy labels. Specifically, increasing the variance will boost the memorization effects and reduce the harmfulness of incorrect labels. By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses and be plugged in many existing algorithms. Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.

preprint2022arXiv

Emulating Quantum Dynamics with Neural Networks via Knowledge Distillation

High-fidelity quantum dynamics emulators can be used to predict the time evolution of complex physical systems. Here, we introduce an efficient training framework for constructing machine learning-based emulators. Our approach is based on the idea of knowledge distillation and uses elements of curriculum learning. It works by constructing a set of simple, but rich-in-physics training examples (a curriculum). These examples are used by the emulator to learn the general rules describing the time evolution of a quantum system (knowledge distillation). The goal is not only to obtain high-quality predictions, but also to examine the process of how the emulator learns the physics of the underlying problem. This allows us to discover new facts about the physical system, detect symmetries, and measure relative importance of the contributing physical processes. We illustrate this approach by training an artificial neural network to predict the time evolution of quantum wave packages propagating through a potential landscape. We focus on the question of how the emulator learns the rules of quantum dynamics from the curriculum of simple training examples and to which extent it can generalize the acquired knowledge to solve more challenging cases.

preprint2022arXiv

Instance-dependent Label-noise Learning under a Structural Causal Model

Label noise will degenerate the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let X and Y denote the instance and clean label, respectively. When Y is a cause of X, according to which many datasets have been constructed, e.g., SVHN and CIFAR, the distributions of P(X) and P(Y|X) are entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label noise problem. In this paper, by leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning. In particular, we show that properly modeling the instances will contribute to the identifiability of the label noise transition matrix and thus lead to a better classifier. Empirically, our method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.

preprint2022arXiv

Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos

Multimodal sentiment analysis in videos is a key task in many real-world applications, which usually requires integrating multimodal streams including visual, verbal and acoustic behaviors. To improve the robustness of multimodal fusion, some of the existing methods let different modalities communicate with each other and modal the crossmodal interaction via transformers. However, these methods only use the single-scale representations during the interaction but forget to exploit multi-scale representations that contain different levels of semantic information. As a result, the representations learned by transformers could be biased especially for unaligned multimodal data. In this paper, we propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis. On the whole, the "multi-scale" mechanism is capable of exploiting the different levels of semantic information of each modality which are used for fine-grained crossmodal interactions. Meanwhile, each modality learns its feature hierarchies via integrating the crossmodal interactions from multiple level features of its source modality. In this way, each pair of modalities progressively builds feature hierarchies respectively in a cooperative manner. The empirical results illustrate that our MCMulT model not only outperforms existing approaches on unaligned multimodal sequences but also has strong performance on aligned multimodal sequences.

preprint2022arXiv

Rethinking Class-Prior Estimation for Positive-Unlabeled Learning

Given only positive (P) and unlabeled (U) data, PU learning can train a binary classifier without any negative data. It has two building blocks: PU class-prior estimation (CPE) and PU classification; the latter has been well studied while the former has received less attention. Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. If this is violated, those CPE methods will systematically overestimate the class prior; it is even worse that we cannot verify the assumption based on the data. In this paper, we rethink CPE for PU learning-can we remove the assumption to make CPE always valid? We show an affirmative answer by proposing Regrouping CPE (ReCPE) that builds an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. ReCPE can work with any CPE method by treating it as the base method. Theoretically, ReCPE does not affect its base if the assumption already holds for the original probability distribution; otherwise, it reduces the positive bias of its base. Empirically, ReCPE improves all state-of-the-art CPE methods on various datasets, implying that the assumption has indeed been violated here.

preprint2021arXiv

Smart Black Box 2.0: Efficient High-bandwidth Driving Data Collection based on Video Anomalies

Autonomous vehicles require fleet-wide data collection for continuous algorithm development and validation. The Smart Black Box (SBB) intelligent event data recorder has been proposed as a system for prioritized high-bandwidth data capture. This paper extends the SBB by applying anomaly detection and action detection methods for generalized event-of-interest (EOI) detection. An updated SBB pipeline is proposed for the real-time capture of driving video data. A video dataset is constructed to evaluate the SBB on real-world data for the first time. SBB performance is assessed by comparing the compression of normal and anomalous data and by comparing our prioritized data recording with a FIFO strategy. Results show that SBB data compression can increase the anomalous-to-normal memory ratio by ~25%, while the prioritized recording strategy increases the anomalous-to-normal count ratio when compared to a FIFO strategy. We compare the real-world dataset SBB results to a baseline SBB given ground-truth anomaly labels and conclude that improved general EOI detection methods will greatly improve SBB performance.

preprint2020arXiv

Extended quasi-cyclic constructions of quantum codes and entanglement-assisted quantum codes

Construction of quantum codes and entanglement-assisted quantum codes with good parameters via classical codes is an important task for quantum computing and quantum information. In this paper, by a family of one-generator quasi-cyclic codes, we provide quasi-cyclic extended constructions that preserve the self-orthogonality to obtain stabilizer quantum codes. As for the computational results, some binary and ternary stabilizer codes with good parameters are constructed. Moreover, we present methods to construct maximal-entanglement entanglement-assisted quantum codes by means of the class of quasi-cyclic codes and their extended codes. As an application, some good maximal-entanglement entanglement-assisted quantum codes are obtained and their parameters are compared.

preprint2020arXiv

Numerical simulations of surf zone wave dynamics using Smoothed Particle Hydrodynamics

In this study we investigated the capabilities of the mesh-free, Lagrangian particle method (Smoothed Particle Hydrodynamics, SPH) to simulate the detailed hydrodynamic processes generated by both spilling and plunging breaking waves within the surf zone. The weakly-compressible SPH code DualSPHysics was applied to simulate wave breaking over two distinct bathymetric profiles (a plane beach and fringing reef) and compared to experimental flume measurements of waves, flows, and mean water levels. Despite the simulations spanning very different wave breaking conditions (including an extreme case with violently plunging waves on an effectively dry reef slope), the model was able to reproduce a wide range of relevant surf zone hydrodynamic processes using a fixed set of numerical parameters. This included accurate predictions of the nonlinear evolution of wave shapes (e.g., asymmetry and skewness properties), rates of wave dissipation within the surf zone, and wave setup distributions. By using this mesh-free approach, the model was able to resolve the critical crest region within the breaking waves, which provided robust predictions of the wave-induced mass fluxes within the surf zone responsible for the undertow. Within this breaking crest region, the model results capture how the potential energy of the organized wave motion is initially converted to kinetic energy and then dissipated, which reproduces the distribution of wave forces responsible for wave setup generation across the surf zone. Overall, the results reveal how the mesh-free SPH approach can accurately reproduce the detailed wave breaking processes with comparable skill to state-of-the-art mesh-based Computational Fluid Dynamics (CFD) models, and thus can be applied to provide valuable new physical insight into surf zone dynamics.

preprint2020arXiv

When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos

Video anomaly detection (VAD) has been extensively studied. However, research on egocentric traffic videos with dynamic scenes lacks large-scale benchmark datasets as well as effective evaluation metrics. This paper proposes traffic anomaly detection with a \textit{when-where-what} pipeline to detect, localize, and recognize anomalous events from egocentric videos. We introduce a new dataset called Detection of Traffic Anomaly (DoTA) containing 4,677 videos with temporal, spatial, and categorical annotations. A new spatial-temporal area under curve (STAUC) evaluation metric is proposed and used with DoTA. State-of-the-art methods are benchmarked for two VAD-related tasks.Experimental results show STAUC is an effective VAD metric. To our knowledge, DoTA is the largest traffic anomaly dataset to-date and is the first supporting traffic anomaly studies across when-where-what perspectives. Our code and dataset can be found in: https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly

preprint2019arXiv

Two-body charmed baryon decays involving vector meson with $SU(3)$ flavor symmetry

We study the two-body anti-triplet charmed baryon decays of ${\bf B}_c\to {\bf B}_n V$, with ${\bf B}_c=(Ξ_c^{0},-Ξ_c^{+},Λ_c^+)$ and ${\bf B}_n(V)$ the baryon (vector meson) states. Based on the $SU(3)$ flavor symmetry, we predict that ${\cal B}(Λ^{+}_{c}\to Σ^{+}ρ^{0},Λ^0 ρ^+)=(0.61\pm 0.46,0.74\pm 0.34)\%$, in agreement with the experimental upper bounds of $(1.7,6)\%$, respectively. We also find ${\cal B}(Λ^+_c \to Ξ^0 K^{*+},Σ^0 K^{*+},Λ^0 K^{*+}) =(8.7 \pm 2.7,1.2\pm 0.3,2.0\pm 0.5)\times 10^{-3}$ to be compatible with the pseudoscalar counterparts. For the doubly Cabibbo-suppressed decay $Ξ^+_c \to pϕ$, measured for the first time, we predict its branching ratio to be $(1.5\pm 0.7)\times 10^{-4}$, together with ${\cal B}(Ξ^+_c \to p \bar K^{*0},Σ^+ ϕ) =(7.8 \pm 2.2,1.9\pm0.9)\times 10^{-3}$. The ${\bf B}_c\to{\bf B}_n V$ decays with ${\cal B}\simeq {\cal O}(10^{-4}-10^{-3})$ are accessible to the BESIII, BELLEII and LHCb experiments.