Researcher profile

Qiyuan Hu

Qiyuan Hu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

Multimodal video-audio-text understanding and generation can benefit from datasets that are narrow but rich. The narrowness allows bite-sized challenges that the research community can make progress on. The richness ensures we are making progress along the core challenges. To this end, we present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun [11]. We made substantial modifications to make the game richer by introducing audio and enabling new interactions. We trained RL agents with different objectives to navigate the game and interact with 13 objects and characters. This allows us to automatically extract a large collection of diverse videos and associated audio. We sample 375K video clips (3.2s each) and collect text descriptions from human annotators. Each video has additional annotations that are extracted automatically from the game engine, such as accurate semantic maps for each frame and templated textual descriptions. Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation. We benchmark representative approaches on tasks involving video-audio-text retrieval and generation. Our dataset and code are released at: https://mugen-org.github.io/.

preprint2022arXiv

Nanomechanical testing of silica nanospheres for levitated optomechanics experiments

Optically-levitated dielectric particles can serve as ultra-sensitive detectors of feeble forces and torques, as tools for use in quantum information science, and as a testbed for quantum coherence in macroscopic systems. Knowledge of the structural and optical properties of the particles is important for calibrating the sensitivity of such experiments. Here we report the results of nanomechanical testing of silica nanospheres and investigate an annealing approach which can produce closer to bulk-like behavior in the samples in terms of their elastic moduli. These results, combined with our experimental investigations of optical trap lifetimes in high vacuum at high trapping-laser intensity for both annealed and as-grown nanospheres, were used to provide a theoretical analysis of the effects of porosity and non-sphericity in the samples, identifying possible mechanisms of trapping instabilities for nanospheres with non-bulk-silica-like properties.