Source author record

Dawei Huang

Dawei Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computational Complexity Data Structures and Algorithms Databases Information Theory math.IT Sound

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict

Speech Emotion Recognition (SER) systems often assume congruence between vocal emotion and lexical semantics. However, in real-world interactions, acoustic-semantic conflict is common yet overlooked, where the emotion conveyed by tone contradicts the literal meaning of spoken words. We show that state-of-the-art SER models, including ASR-based, self-supervised learning (SSL) approaches and Audio Language Models (ALMs), suffer performance degradation under such conflicts due to semantic bias or entangled acoustic-semantic representations. To address this, we propose the Fusion Acoustic-Semantic (FAS) framework, which explicitly disentangles acoustic and semantic pathways and bridges them through a lightweight, query-based attention module. To enable systematic evaluation, we introduce the Conflict in Acoustic-Semantic Emotion (CASE), the first dataset dominated by clear and interpretable acoustic-semantic conflicts in varied scenarios. Extensive experiments demonstrate that FAS consistently outperforms existing methods in both in-domain and zero-shot settings. Notably, on the CASE benchmark, conventional SER models fail dramatically, while FAS sets a new SOTA with 59.38% accuracy. Our code and datasets is available at https://github.com/24DavidHuang/FAS.

preprint2020arXiv

Approximate Generalized Matching: $f$-Factors and $f$-Edge Covers

In this paper we present linear time approximation schemes for several generalized matching problems on nonbipartite graphs. Our results include $O_ε(m)$-time algorithms for $(1-ε)$-maximum weight $f$-factor and $(1+ε)$-approximate minimum weight $f$-edge cover. As a byproduct, we also obtain direct algorithms for the exact cardinality versions of these problems running in $O(m\sqrt{f(V)})$ time. The technical contributions of this work include an efficient method for maintaining {\em relaxed complementary slackness} in generalized matching problems and approximation-preserving reductions between the $f$-factor and $f$-edge cover problems.

preprint2020arXiv

Joins on Samples: A Theoretical Guide for Practitioners

Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximation's accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics.

preprint2020arXiv

The Communication Complexity of Set Intersection and Multiple Equality Testing

In this paper we explore fundamental problems in randomized communication complexity such as computing Set Intersection on sets of size $k$ and Equality Testing between vectors of length $k$. Sağlam and Tardos and Brody et al. showed that for these types of problems, one can achieve optimal communication volume of $O(k)$ bits, with a randomized protocol that takes $O(\log^* k)$ rounds. Aside from rounds and communication volume, there is a \emph{third} parameter of interest, namely the \emph{error probability} $p_{\mathrm{err}}$. It is straightforward to show that protocols for Set Intersection or Equality Testing need to send $Ω(k + \log p_{\mathrm{err}}^{-1})$ bits. Is it possible to simultaneously achieve optimality in all three parameters, namely $O(k + \log p_{\mathrm{err}}^{-1})$ communication and $O(\log^* k)$ rounds? In this paper we prove that there is no universally optimal algorithm, and complement the existing round-communication tradeoffs with a new tradeoff between rounds, communication, and probability of error. In particular: 1. Any protocol for solving Multiple Equality Testing in $r$ rounds with failure probability $2^{-E}$ has communication volume $Ω(Ek^{1/r})$. 2. There exists a protocol for solving Multiple Equality Testing in $r + \log^*(k/E)$ rounds with $O(k + rEk^{1/r})$ communication, thereby essentially matching our lower bound and that of Sağlam and Tardos. Our original motivation for considering $p_{\mathrm{err}}$ as an independent parameter came from the problem of enumerating triangles in distributed ($\textsf{CONGEST}$) networks having maximum degree $Δ$. We prove that this problem can be solved in $O(Δ/\log n + \log\log Δ)$ time with high probability $1-1/\operatorname{poly}(n)$.

Dawei Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict

Approximate Generalized Matching: $f$-Factors and $f$-Edge Covers

Joins on Samples: A Theoretical Guide for Practitioners

The Communication Complexity of Set Intersection and Multiple Equality Testing