Source author record

Ming Du

Ming Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Biological Physics Computational Engineering, Finance, and Science Computer Vision Databases eess.IV physics.app-ph physics.data-an

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Scientific data processing often requires task-specific algorithms or AI models, creating a barrier for domain scientists who need to analyze their data but may not have extensive computing or image-processing expertise. This barrier is especially pronounced when data are noisy, have a high dynamic range, are sparsely labeled, or are only loosely specified. We introduce CVEvolve, an autonomous agentic harness with a zero-code interface for scientific data-processing algorithm discovery. CVEvolve combines a multi-round search strategy with tools for code execution, evaluation implementation, history management, holdout testing, and optional inspection of scientific data and visual outputs. The search alternates between discovery and improvement actions, and uses lineage-aware stochastic candidate sampling to balance exploration and exploitation. We demonstrate CVEvolve on x-ray fluorescence microscopy image registration, Bragg peak detection, and high-energy diffraction microscopy image segmentation. Across these tasks, CVEvolve discovers algorithms that improve over baseline methods, while holdout test tracking helps identify candidates that generalize better than later over-optimized alternatives. These results show that zero-code, autonomous LLM-powered algorithm development can help domain scientists turn unstructured scientific image data into practical algorithms and downstream scientific discoveries.

preprint2022arXiv

A Wavelet Transform and self-supervised learning-based framework for bearing fault diagnosis with limited labeled data

Traditional supervised bearing fault diagnosis methods rely on massive labelled data, yet annotations may be very time-consuming or infeasible. The fault diagnosis approach that utilizes limited labelled data is becoming increasingly popular. In this paper, a Wavelet Transform (WT) and self-supervised learning-based bearing fault diagnosis framework is proposed to address the lack of supervised samples issue. Adopting the WT and cubic spline interpolation technique, original measured vibration signals are converted to the time-frequency maps (TFMs) with a fixed scale as inputs. The Vision Transformer (ViT) is employed as the encoder for feature extraction, and the self-distillation with no labels (DINO) algorithm is introduced in the proposed framework for self-supervised learning with limited labelled data and sufficient unlabeled data. Two rolling bearing fault datasets are used for validations. In the case of both datasets only containing 1% labelled samples, utilizing the feature vectors extracted by the trained encoder without fine-tuning, over 90\% average diagnosis accuracy can be obtained based on the simple K-Nearest Neighbor (KNN) classifier. Furthermore, the superiority of the proposed method is demonstrated in comparison with other self-supervised fault diagnosis methods.

preprint2022arXiv

Efficient Reachability Ratio Computation for 2-hop Labeling Scheme

As one of the fundamental graph operations, reachability queries processing has been extensively studied during the past decades. Many approaches followed the line of designing 2-hop labels to make acceleration. Considering that the index size cannot be bounded when using all nodes to construct 2-hop labels, researchers proposed to use a part of important nodes to construct 2-hop labels (partial 2-hop labels) to cover as much reachability information as possible. Then, we may achieve better query performance with limited index size and index construction time. However, partial 2-hop labels do not always perform well on different graphs. In this paper, we focus on the problem of how to efficiently compute reachability ratio, such that to help users determine whether partial 2-hop labels should be used to answer reachability queries for the given graph. Intuitively, reachability ratio denotes the ratio of the number of reachable queries that can be answered by partial 2-hop labels over the total number of reachable queries involved in the given graph. We discuss the difficulties of reachability ratio computation, and propose an incremental-partition algorithm for reachability ratio computation. We show by rich experimental results that our algorithm can efficiently get the result of reachability ratio, and show how the overall query performance is affected by different partial 2-hop labels. Based on the experimental results, we give out our findings on whether partial 2-hop labels should be used to the given graph for reachability queries processing.

preprint2022arXiv

Searching for Apparel Products from Images in the Wild

In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation.

preprint2020arXiv

Relative merits and limiting factors for x-ray and electron microscopy of thick, hydrated organic materials (revised)

Electron and x-ray microscopes allow one to image the entire, unlabeled structure of hydrated materials at a resolution well beyond what visible light microscopes can achieve. However, both approaches involve ionizing radiation, so that radiation damage must be considered as one of the limits to imaging. Drawing upon earlier work, we describe here a unified approach to estimating the image contrast (and thus the required exposure and corresponding radiation dose) in both x-ray and electron microscopy. This approach accounts for factors such as plural and inelastic scattering, and (in electron microscopy) the use of energy filters to obtain so-called "zero loss" images. As expected, it shows that electron microscopy offers lower dose for specimens thinner than about 1 micron (such as for studies of macromolecules, viruses, bacteria and archaebacteria, and thin sectioned material), while x-ray microscopy offers superior characteristics for imaging thicker specimen such as whole eukaryotic cells, thick-sectioned tissues, and organs. The required radiation dose scales strongly as a function of the desired spatial resolution, allowing one to understand the limits of live and frozen hydrated specimen imaging. Finally, we consider the factors limiting x-ray microscopy of thicker materials, suggesting that specimens as thick as a whole mouse brain can be imaged with x-ray microscopes without significant image degradation should appropriate image reconstruction methods be identified. The as-published article [Ultramicroscopy 184, 293--309 (2018); doi:10.1016/j.ultramic.2017.10.003] had some minor mistakes that we correct here, with all changes from the as-published article shown in blue.

Ming Du

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

A Wavelet Transform and self-supervised learning-based framework for bearing fault diagnosis with limited labeled data

Efficient Reachability Ratio Computation for 2-hop Labeling Scheme

Searching for Apparel Products from Images in the Wild

Relative merits and limiting factors for x-ray and electron microscopy of thick, hydrated organic materials (revised)