Source author record

Yue Liu

Yue Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cond-mat.str-el eess.AS Machine Learning quant-ph Quantitative Methods Software Engineering Sound

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multi-objective Bayesian inference in an agent-based model of zebrafish patterns via topological data analysis

Spatial patterns arising from the collective behavior of individual agents are present across biological systems. While agent-based models offer a natural framework for uncovering unknown agent (e.g., cell) interactions, these stochastic models face significant challenges. For spatial patterns, agent-based modeling often involves manual tuning to attain qualitative consistency with multiple experiments. This process limits predictive power and raises questions about parameter identifiability and model uniqueness. Combining topological techniques and Bayesian computation, we present a multi-objective methodology for parameter inference in detailed models. We illustrate our approach by inferring parameters in an agent-based model of zebrafish patterns, achieving practical identifiability in several case studies. By introducing extended prior distributions, we then reframe parameter inference as rule inference, allowing us to search across over 80 candidate agent-based rules to identify an alternative, simpler model consistent with our data.

preprint2026arXiv

On the Representation of Pairwise Causal Background Knowledge and Its Applications in Causal Inference

Pairwise causal background knowledge about the existence or absence of causal edges and paths is frequently encountered in observational studies. Such constraints allow the shared directed and undirected edges in the constrained subclass of Markov equivalent DAGs to be represented as a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and introduce a minimal representation of a causal MPDAG. Then, we give a unified representation for three types of pairwise causal background knowledge, including direct, ancestral and non-ancestral causal knowledge, by introducing a novel concept called direct causal clause (DCC). Using DCCs, we study the consistency and equivalence of pairwise causal background knowledge and show that any pairwise causal background knowledge set can be uniquely and equivalently decomposed into the causal MPDAG representing the refined Markov equivalence class and a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking consistency and equivalence, as well as for finding the decomposed MPDAG and the residual DCCs. Finally, with pairwise causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that pairwise causal background knowledge can significantly improve the identifiability of causal effects.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2025arXiv

Probing Defects with Quantum Simulator Snapshots

Snapshots, i.e. projective measurements of local degrees of freedom, are the most standard data taken in experiments on quantum simulators. Snapshots are usually used to probe local physics. In this work we propose a simple protocol to experimentally probe physics of defects with these snapshots. Our protocol relies only on snapshots from the bulk system, without introducing the defect explicitly; as such, the physics of different kinds of defects can be probed using the same dataset. In particular, we demonstrate that with snapshots of local spin configurations of, for example, the $1d$ Rydberg atom realization of the quantum Ising criticality, we can (1) extract the ``defect entropy", and (2) access the continuous line of fixed points of effective defect conformal field theory, which was recently discussed in the context of the ``weak-measurement altered criticality".

preprint2025arXiv

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables

Wearable devices such as AI glasses are transforming voice assistants into always-available, hands-free collaborators that integrate seamlessly with daily life, but they also introduce challenges like egocentric audio affected by motion and noise, rapid micro-interactions, and the need to distinguish device-directed speech from background conversations. Existing benchmarks largely overlook these complexities, focusing instead on clean or generic conversational audio. To bridge this gap, we present WearVox, the first benchmark designed to rigorously evaluate voice assistants in realistic wearable scenarios. WearVox comprises 3,842 multi-channel, egocentric audio recordings collected via AI glasses across five diverse tasks including Search-Grounded QA, Closed-Book QA, Side-Talk Rejection, Tool Calling, and Speech Translation, spanning a wide range of indoor and outdoor environments and acoustic conditions. Each recording is accompanied by rich metadata, enabling nuanced analysis of model performance under real-world constraints. We benchmark leading proprietary and open-source speech Large Language Models (SLLMs) and find that most real-time SLLMs achieve accuracies on WearVox ranging from 29% to 59%, with substantial performance degradation on noisy outdoor audio, underscoring the difficulty and realism of the benchmark. Additionally, we conduct a case study with two new SLLMs that perform inference with single-channel and multi-channel audio, demonstrating that multi-channel audio inputs significantly enhance model robustness to environmental noise and improve discrimination between device-directed and background speech. Our results highlight the critical importance of spatial audio cues for context-aware voice assistants and establish WearVox as a comprehensive testbed for advancing wearable voice AI research.

Yue Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Multi-objective Bayesian inference in an agent-based model of zebrafish patterns via topological data analysis

On the Representation of Pairwise Causal Background Knowledge and Its Applications in Causal Inference

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Probing Defects with Quantum Simulator Snapshots

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables