Source author record

Yiming Ding

Yiming Ding appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory Machine Learning math.IT Artificial Intelligence Computer Vision math.PR physics.soc-ph cond-mat.other hep-lat Methodology Multimedia Neural and Evolutionary Computing physics.chem-ph physics.data-an physics.optics quant-ph Social and Information Networks Sound

Catalog footprint

What is connected

11works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

Current omni-modal benchmarks mainly evaluate models under settings where multiple modalities are provided simultaneously, while the ability to start from audio alone and actively search for cross-modal evidence remains underexplored. In this paper, we introduce \textbf{Omni-DeepSearch}, a benchmark for audio-driven omni-modal deep search. Given one or more audio clips and a related question, models must infer useful clues from audio, invoke text, image, and video search tools, and perform multi-hop reasoning to produce a short, objective, and verifiable answer. Omni-DeepSearch contains 640 samples across 15 fine-grained categories, covering four retrieval target modalities and four audio content types. A multi-stage filtering pipeline ensures audio dependence, retrieval necessity, visual modality necessity, and answer uniqueness. Experiments on recent closed-source and open-source omni-modal models show that this task remains highly challenging: the strongest evaluated model, Gemini-3-Pro, achieves only 43.44\% average accuracy. Further analyses illustrate key bottlenecks in audio entity inference, query formulation, tool-use reliability, multi-hop retrieval, and cross-modal verification. These results highlight audio-driven omni-modal deep search as an important and underexplored direction for future multimodal agents.

preprint2026arXiv

Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval

Video Moment Retrieval (VMR) aims to localize temporal segments in videos that correspond to a natural language query, but typically assumes only a single matching moment for each query. This assumption does not always hold in real-world scenarios, where queries may correspond to multiple or no moments. Thus, we formulate Generalized Moment Retrieval (GMR), a unified setting that requires retrieving the complete set of relevant moments or predicting an empty set. To enable systematic study of GMR, we introduce Soccer-GMR, a large-scale benchmark built on challenging soccer videos that reflect general GMR scenarios, with realistic negative and positive queries. The benchmark is constructed via a duration-flexible semi-automated pipeline with human verification, enabling scalable data generation while maintaining high annotation quality. We further design a unified evaluation protocol with complementary metrics tailored for null-set rejection, positive-query localization, and end-to-end GMR performance. Finally, we establish strong baselines across two modeling paradigms: a lightweight plug-and-play GMR adapter for discriminative VMR models, and a GMR-tailored GRPO reward for fine-tuning multimodal large language models (MLLMs). Extensive experiments show consistent gains across all metrics and expose key limitations of current methods, positioning GMR as a more realistic and challenging benchmark for video-language understanding.

preprint2022arXiv

Digital quantum simulation and Pseudoquantum Simulation of $\mathbb{Z}_2$ Gauge Higgs Model

We present a quantum algorithm for digital quantum simulation of the $\mathbb{Z}_2$ gauge-Higgs model on a $3\times 3$ lattice, which is based on Trotter decomposition, the quantum adiabatic algorithm and its circuit realization. Then we perform a classical demonstration, dubbed a pseudoquantum simulation, on a GPU simulator. We obtain useful results on this model, which suggest the topological properties of the deconfined phase and help to clarify the phase diagram. It is suggested that the tricitical point, where the second-order critical lines of deconfinement-confinement transition and of deconfinement-Higgs transition meet, seems to be on the the first-order critical line of confinement-Higgs transition, at a point other than the end of this critical line.

preprint2020arXiv

Dual-comb spectroscopy for high-temperature reaction kinetics

In the current study, a quantum-cascade-laser-based dual-comb spectrometer (DCS) was used to paint a detailed picture of a 1.0 ms high-temperature reaction between propyne and oxygen. The DCS interfaced with a shock tube to provide pre-ignition conditions of 1225 K, 2.8 atm, and 2% p-C3H4/18% O2/Ar. The spectrometer consisted of two free-running, non-stabilized frequency combs each emitting at 179 wavelengths between 1174 and 1233 cm-1. A free spectral range, f_r, of 9.86 GHz and a difference in comb spacing, Δf_r, of 5 MHz, enabled a theoretical time resolution of 0.2 us but the data was time-integrated to 4 us to improve SNR. The accuracy of the spectrometer was monitored using a suite of independent laser diagnostics and good agreement observed.

preprint2020arXiv

Goal-conditioned Imitation Learning

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, we show our method can also be used when the available expert trajectories do not contain the actions, which can leverage kinesthetic or third person demonstration. The code is available at https://sites.google.com/view/goalconditioned-il/.

preprint2020arXiv

Mutual Information Maximization for Robust Plannable Representations

Extending the capabilities of robotics to real-world complex, unstructured environments requires the need of developing better perception systems while maintaining low sample complexity. When dealing with high-dimensional state spaces, current methods are either model-free or model-based based on reconstruction objectives. The sample inefficiency of the former constitutes a major barrier for applying them to the real-world. The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene. In real environments, the task typically just represents a small fraction of the scene. Reconstruction objectives suffer in such scenarios as they capture all the unnecessary components. In this work, we present MIRO, an information theoretic representational learning algorithm for model-based reinforcement learning. We design a latent space that maximizes the mutual information with the future information while being able to capture all the information needed for planning. We show that our approach is more robust than reconstruction objectives in the presence of distractors and cluttered scenes

preprint2016arXiv

An entropic characterization of long memory stationary process

Long memory or long range dependency is an important phenomenon that may arise in the analysis of time series or spatial data. Most of the definitions of long memory of a stationary process $X=\{X_1, X_2,\cdots,\}$ are based on the second-order properties of the process. The excess entropy of a stationary process is the summation of redundancies which relates to the rate of convergence of the conditional entropy $H(X_n|X_{n-1},\cdots, X_1)$ to the entropy rate. It is proved that the excess entropy is identical to the mutual information between the past and the future when the entropy $H(X_1)$ is finite. We suggest the definition that a stationary process is long memory if the excess entropy is infinite. Since the definition of excess entropy of a stationary process requires very weak moment condition on the distribution of the process, it can be applied to processes whose distributions without bounded second moment. A significant property of excess entropy is that it is invariant under invertible transformation, which enables us to know the excess entropy of a stationary process from the excess entropy of other process. For stationary Guassian process, the excess entropy characterization of long memory relates to popular characterization well. It is proved that the excess entropy of fractional Gaussian noise is infinite if the Hurst parameter $H \in (1/2, 1)$.

preprint2015arXiv

Wavelet-based Estimator for the Hurst Parameters of Fractional Brownian Sheet

It is proposed a class of statistical estimators $\hat H =(\hat H_1, \ldots, \hat H_d)$ for the Hurst parameters $H=(H_1, \ldots, H_d)$ of fractional Brownian field via multi-dimensional wavelet analysis and least squares, which are asymptotically normal. These estimators can be used to detect self-similarity and long-range dependence in multi-dimensional signals, which is important in texture classification and improvement of diffusion tensor imaging (DTI) of nuclear magnetic resonance (NMR). Some fractional Brownian sheets will be simulated and the simulated data are used to validate these estimators. We find that when $H_i \geq 1/2$, the estimators are efficient, and when $H_i < 1/2$, there are some bias.

preprint2014arXiv

Evaluation of node importance in complex networks

The assessment of node importance has been a fundamental issue in the research of complex networks. In this paper, we propose to use the Shannon-Parry measure (SPM) to evaluate the importance of a node quantitatively, because SPM is the stationary distribution of the most unprejudiced random walk on the network. We demonstrate the accuracy and robustness of SPM compared with several popular methods in the Zachary karate club network and three toy networks. We apply SPM to analyze the city importance of China Railways High-speed (CRH) network, and obtain reasonable results. Since SPM can be used effectively in weighted and directed network, we believe it is a relevant method to identify key nodes in networks.

preprint2011arXiv

The rates of convergence for generalized entropy of the normalized sums of IID random variables

We consider the generalized differential entropy of normalized sums of independent and identically distributed (IID) continuous random variables. We prove that the Rényi entropy and Tsallis entropy of order $α (α>0)$ of the normalized sum of IID continuous random variables with bounded moments are convergent to the corresponding Rényi entropy and Tsallis entropy of the Gaussian limit, and obtain sharp rates of convergence.

preprint2010arXiv

How to Measure Significance of Community Structure in Complex Networks

Community structure analysis is a powerful tool for complex networks, which can simplify their functional analysis considerably. Recently, many approaches were proposed to community structure detection, but few works were focused on the significance of community structure. Since real networks obtained from complex systems always contain error links, and most of the community detection algorithms have random factors, evaluate the significance of community structure is important and urgent. In this paper, we use the eigenvectors' stability to characterize the significance of community structures. By employing the eigenvalues of Laplacian matrix of a given network, we can evaluate the significance of its community structure and obtain the optimal number of communities, which are always hard for community detection algorithms. We apply our method to many real networks. We find that significant community structures exist in many social networks and C.elegans neural network, and that less significant community structures appear in protein-interaction networks and metabolic networks. Our method can be applied to broad clustering problems in data mining due to its solid mathematical basis and efficiency.

Yiming Ding

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval

Digital quantum simulation and Pseudoquantum Simulation of $\mathbb{Z}_2$ Gauge Higgs Model

Dual-comb spectroscopy for high-temperature reaction kinetics

Goal-conditioned Imitation Learning

Mutual Information Maximization for Robust Plannable Representations

An entropic characterization of long memory stationary process

Wavelet-based Estimator for the Hurst Parameters of Fractional Brownian Sheet

Evaluation of node importance in complex networks

The rates of convergence for generalized entropy of the normalized sums of IID random variables

How to Measure Significance of Community Structure in Complex Networks