Source author record

Yong Ma

Yong Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision cond-mat.mes-hall Human-Computer Interaction physics.atom-ph physics.optics quant-ph Sound

Catalog footprint

What is connected

4works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton

Multimodal resources for non-periodic court sports with laboratory-grade sensing remain scarce: few publicly pair instrumented ground reaction force (GRF) with high-frame-rate multi-view video, limiting markerless load estimation in realistic training settings. BadmintonGRF records eight synchronized RGB views at ~120 FPS, four Kistler force plates, and Vicon motion capture (C3D) without hardware genlock across modalities; alignment combines human-verified events, automated quality assurance, and per-camera time offsets with uncertainty metadata. Tier 1 distributes pose, time-aligned GRF, metadata, and splits under CC BY-NC 4.0, enabling the primary benchmark without raw RGB or C3D; we report a Tier 1 task that maps 2D pose to GRF. Tier 2 provides raw RGB and C3D under controlled access for studies that require appearance or full kinematics. The public release contains 17,425 impact-segment archives in the 10-subject benchmark tree (156 instrumented trials; raw multi-view RGB alone exceeds 1 TB); benchmark loader gates retain 12,867 view-specific instances and 1,732 unique impacts after multi-view deduplication. We are not aware of prior public badminton corpora that combine this sensing layout with audited video--GRF alignment for impact-centric GRF estimation. We distribute preprocessing code, leave-one-subject-out splits, ten reference baselines, and optional late fusion (one deterministic test-time pass per instance; no test-time augmentation), with a within-trial diagnostic in the supplementary material.

preprint2026arXiv

DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec

Codec-based language models (LMs) have revolutionized text-to-speech (TTS). However, standard codecs entangle timbre and prosody, which hinders independent control in continuation-based LMs. To tackle this challenge, we propose DisCo-Speech, a zero-shot controllable TTS framework featuring a disentangled speech codec (DisCodec) and an LM-based generator. The core component DisCodec employs a two-stage design: 1) tri-factor disentanglement to separate speech into content, prosody, and timbre subspaces via parallel encoders and hybrid losses; and 2) fusion and reconstruction that merges content and prosody into unified content-prosody tokens suitable for LM prediction, while jointly optimizing reconstruction to address the disentanglement-reconstruction trade-off. This allows the LM to perform prosodic continuation from a style prompt while the decoder injects target timbre, enabling flexible zero-shot control. Experiments demonstrate that DisCo-Speech achieves competitive voice cloning and superior zero-shot prosody control. By resolving the core entanglement at the codec level, DisCo-Speech provides a robust foundation for controllable speech synthesis.

preprint2022arXiv

How Should Voice Assistants Deal With Users' Emotions?

There is a growing body of research in HCI on detecting the users' emotions. Once it is possible to detect users' emotions reliably, the next question is how an emotion-aware interface should react to the detected emotion. In a first step, we tried to find out how humans deal with the negative emotions of an avatar. The hope behind this approach was to identify human strategies, which we can then mimic in an emotion-aware voice assistant. We present a user study in which participants were confronted with an angry, sad, or frightened avatar. Their task was to make the avatar happy by talking to it. We recorded the voice signal and analyzed it. The results show that users predominantly reacted with neutral emotion. However, we also found gender differences, which opens a range of questions.

preprint2016arXiv

Indistinguishable single photons with flexible electronic triggering

A key ingredient for quantum photonic technologies is an on-demand source of indistinguishable single photons. State-of-the-art indistinguishable single-photon sources typically employ resonant excitation pulses with fixed repetition rates, creating a string of single photons with predetermined arrival times. However, in future applications, an independent electronic signal from a larger quantum circuit or network will trigger the generation of an indistinguishable photon. Further, operating the photon source up to the limit imposed by its lifetime is desirable. Here, we report on the application of a true on-demand approach in which we can electronically trigger the precise arrival time of a single photon as well as control the excitation pulse duration based on resonance fluorescence from a single InAs/GaAs quantum dot. We investigate in detail the effect of the finite duration of an excitation $π$ pulse on the degree of photon antibunching. Finally, we demonstrate that highly indistinguishable single photons can be generated using this on-demand approach, enabling maximum flexibility for future applications.