Source author record

Jinjie Shi

Jinjie Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision cond-mat.mtrl-sci physics.app-ph

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

OmniBench: Towards The Future of Universal Omni-Language Models

Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains underexplored, partly due to the lack of comprehensive modality-wise benchmarks. We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as the omni-language models (OLMs). OmniBench is distinguished by high-quality human annotations, ensuring that accurate responses require integrated understanding and reasoning across all three modalities. Our main findings reveal that: i) open-source OLMs exhibit critical limitations in instruction-following and reasoning capabilities within tri-modal contexts; and ii) most baselines models perform poorly (below 50% accuracy) even when provided with alternative textual representations of images or/and audio. These results suggest that the ability to construct a consistent context from text, image, and audio is often overlooked in existing MLLM training paradigms. To address this gap, we curate an instruction tuning dataset of 84.5K training samples, OmniInstruct, for training OLMs to adapt to tri-modal contexts. We advocate for future research to focus on developing more robust tri-modal integration techniques and training strategies to enhance OLMs. Codes, data and live leaderboard could be found at https://m-a-p.ai/OmniBench.

preprint2019arXiv

Three-dimensional acoustic double-zero-index medium with a Dirac-like point

We report a design and experimental realization of a three-dimensional (3D) acoustic double-zero-index medium (DZIM), whose effective mass density and compressibility are nearly zero simultaneously. The DZIM is constructed from a cubic lattice of three orthogonally-aligned metal rods in air. The combination of lattice symmetry and accidental degeneracy yields a four-fold degenerate point with conical dispersion at the Brillouin zone center, where the material becomes a 3D DZIM. Though occupying a finite volume, the 3D DZIM maintains the wave properties of a "void space," and enables rich applications. For demonstration, we fabricate an acoustic "periscope" by placing the designed 3D DZIM inside a 3D bending waveguide, and observe the unusual wave tunneling effect through this waveguide with undisturbed planar wavefront. Our findings establish a practical route to realize 3D DZIM as an effective acoustic "void space," which offers unprecedented opportunities for advanced sound manipulation.