Researcher profile

Ying Fu

Ying Fu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness

Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. It leverages feature similarity between predicted objects and unlabeled reference images. Unlike previous unsupervised methods that lack category guidance and one-shot methods which require labeled data, RefCD introduces a carefully designed feature similarity loss to explicitly guide the learning of potential category-specific features. Additionally, RefCD supports category-agnostic detection without reference images, serving as a unified framework. Comprehensive quantitative and qualitative analysis of category-aware and category-agnostic detection results demonstrates its effectiveness, and RefCD can learn category information in an unsupervised paradigm even without category labels.

preprint2022arXiv

End-to-End Video Text Spotting with Transformer

Recent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0% improvements on video text spotting tasks. The code of TransDETR can be found at https://github.com/weijiawu/TransDETR.

preprint2022arXiv

Estimating Fine-Grained Noise Model via Contrastive Learning

Image denoising has achieved unprecedented progress as great efforts have been made to exploit effective deep denoisers. To improve the denoising performance in realworld, two typical solutions are used in recent trends: devising better noise models for the synthesis of more realistic training data, and estimating noise level function to guide non-blind denoisers. In this work, we combine both noise modeling and estimation, and propose an innovative noise model estimation and noise synthesis pipeline for realistic noisy image generation. Specifically, our model learns a noise estimation model with fine-grained statistical noise model in a contrastive manner. Then, we use the estimated noise parameters to model camera-specific noise distribution, and synthesize realistic noisy training data. The most striking thing for our work is that by calibrating noise models of several sensors, our model can be extended to predict other cameras. In other words, we can estimate cameraspecific noise models for unknown sensors with only testing images, without laborious calibration frames or paired noisy/clean data. The proposed pipeline endows deep denoisers with competitive performances with state-of-the-art real noise modeling methods.

preprint2022arXiv

ProbNVS: Fast Novel View Synthesis with Learned Probability-Guided Sampling

Existing state-of-the-art novel view synthesis methods rely on either fairly accurate 3D geometry estimation or sampling of the entire space for neural volumetric rendering, which limit the overall efficiency. In order to improve the rendering efficiency by reducing sampling points without sacrificing rendering quality, we propose to build a novel view synthesis framework based on learned MVS priors that enables general, fast and photo-realistic view synthesis simultaneously. Specifically, fewer but important points are sampled under the guidance of depth probability distributions extracted from the learned MVS architecture. Based on the learned probability-guided sampling, a neural volume rendering module is elaborately devised to fully aggregate source view information as well as the learned scene structures to synthesize photorealistic target view images. Finally, the rendering results in uncertain, occluded and unreferenced regions can be further improved by incorporating a confidence-aware refinement module. Experiments show that our method achieves 15 to 40 times faster rendering compared to state-of-the-art baselines, with strong generalization capacity and comparable high-quality novel view synthesis performance.

preprint2021arXiv

Heat Transport in Herbertsmithite: Can a Quantum Spin Liquid Survive Disorder?

Arguably the most favorable situation for spins to enter the long-sought quantum spin liquid (QSL) state is when they sit on a kagome lattice. No consensus has been reached in theory regarding the true ground state of this promising platform. The experimental efforts, relying mostly on one archetypal material ZnCu$_3$(OH)$_6$Cl$_2$, have also led to diverse possibilities. Apart from subtle interactions in the Hamiltonian, there is the additional degree of complexity associated with disorder in the real material ZnCu$_3$(OH)$_6$Cl$_2$ that haunts most experimental probes. Here we resort to heat transport measurement, a cleaner probe in which instead of contributing directly, the disorder only impacts the signal from the kagome spins. For ZnCu$_3$(OH)$_6$Cl$_2$ and a related QSL candidate Cu$_3$Zn(OH)$_6$FBr, we observed no contribution by any spin excitation nor any field-induced change to the thermal conductivity. These results impose different constraints on various scenarios about the ground state of these two kagome compounds: while a gapped QSL, or certain quantum paramagnetic state other than a QSL, is compatible with our results, a gapless QSL must be dramatically modified by the disorder so that gapless spin excitations are localized.

preprint2021arXiv

Partial FC: Training 10 Million Identities on a Single Machine

Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc.

preprint2021arXiv

V-Gas: Generating High Gas Consumption Inputs to Avoid Out-of-Gas Vulnerability

The out-of-gas error occurs when smart contract programs are provided with inputs that cause excessive gas consumption, and would be easily exploited to make the DoS attack. Multiple approaches have been proposed to estimate the gas limit of a function in smart contracts to avoid such error. However, under estimation often happens when the contract is complicated. In this work, we propose V-Gas, which could automatically generate inputs that maximizes the gas cost and reduce the under estimation cases. V-Gas is designed based on feedback-directed mutational fuzz testing. First, V-Gas builds the gas weighted control flow graph (CFG) of functions in smart contracts. Then, V-Gas develops gas consumption guided selection and mutation strategies to generate the input that maximize the gas consumption. For evaluation, we implement V-Gas based on js-evm, a widely used ethereum virtual machine written in javascript, and conduct experiments on 736 real-world transactions recorded on Ethereum. 44.02\% of the transactions would have out-of-gas errors under the estimation results given by solc, means that the recorded real gas consumption for those recorded transactions is larger than the gas limit value estimated by solc. While V-Gas could reduce the under estimation ratio to 13.86\%. Furthermore, V-Gas has exposed 25 previously unknown out-of-gas vulnerabilities in those widely-used smart contracts, 5 of which have been assigned unique CVE identifiers in the US National Vulnerability Database.

preprint2020arXiv

3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising

In this paper, we propose an alternating directional 3D quasi-recurrent neural network for hyperspectral image (HSI) denoising, which can effectively embed the domain knowledge -- structural spatio-spectral correlation and global correlation along spectrum. Specifically, 3D convolution is utilized to extract structural spatio-spectral correlation in an HSI, while a quasi-recurrent pooling function is employed to capture the global correlation along spectrum. Moreover, alternating directional structure is introduced to eliminate the causal dependency with no additional computation cost. The proposed model is capable of modeling spatio-spectral dependency while preserving the flexibility towards HSIs with arbitrary number of bands. Extensive experiments on HSI denoising demonstrate significant improvement over state-of-the-arts under various noise settings, in terms of both restoration accuracy and computation time. Our code is available at https://github.com/Vandermode/QRNN3D.

preprint2020arXiv

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising

Lacking rich and realistic data, learned single image denoising algorithms generalize poorly to real raw images that do not resemble the data used for training. Although the problem can be alleviated by the heteroscedastic Gaussian model for noise synthesis, the noise sources caused by digital camera electronics are still largely overlooked, despite their significant effect on raw measurement, especially under extremely low-light condition. To address this issue, we present a highly accurate noise formation model based on the characteristics of CMOS photosensors, thereby enabling us to synthesize realistic samples that better match the physics of image formation process. Given the proposed noise model, we additionally propose a method to calibrate the noise parameters for available modern digital cameras, which is simple and reproducible for any new device. We systematically study the generalizability of a neural network trained with existing schemes, by introducing a new low-light denoising dataset that covers many modern digital cameras from diverse brands. Extensive empirical results collectively show that by utilizing our proposed noise formation model, a network can reach the capability as if it had been trained with rich real data, which demonstrates the effectiveness of our noise formation model.

preprint2020arXiv

Core-level x-ray photoemission and Raman spectroscopy studies on electronic structures in Mott-Hubbard type nickelate oxide NdNiO$_2$

We perform core-level X-ray photoemission spectroscopy (XPS) and electronic Raman scattering studies of electronic structures and spin fluctuations in the bulk samples of the nickelate oxide NdNiO$_2$. According to Nd $3d$ and O $1s$ XPS spectra, we conclude that NdNiO$_2$ has a large transfer energy. From the analysis of the main line of the Ni $2p_{3/2}$ XPS, we confirm the NiO$_2$ planes in NdNiO$_2$ are of Mott-Hubbard type in the Zaanen-Sawatzky-Allen scheme. The two-magnon peak in the Raman scattering provides direct evidence for the strong spin-fluctuation in NdNiO$_2$. The peak position determines the antiferromagnetic exchange $J=25$~meV. Our experimental results agree well with our previous theoretical results.

preprint2020arXiv

Crystalline and magnetic structures, magnetization, heat capacity and anisotropic magnetostriction effect in a yttrium-chromium oxide

We have studied a nearly stoichiometric insulating Y$_{0.97(2)}$Cr$_{0.98(2)}$O$_{3.00(2)}$ single crystal by performing measurements of magnetization, heat capacity, and neutron diffraction. Albeit that the YCrO$_3$ compound behaviors like a soft ferromagnet with a coersive force of $\sim$ 0.05 T, there exist strong antiferromagnetic (AFM) interactions between Cr$^{3+}$ spins due to a strongly negative paramagnetic Curie-Weiss temperature, i.e., -433.2(6) K. The coexistence of ferromagnetism and antiferromagnetism may indicate a canted AFM structure. The AFM phase transition occurs at $T_\textrm{N} =$ 141.5(1) K, which increases to $T_\textrm{N}$(5T) = 144.5(1) K at 5 T. Within the accuracy of the present neuron-diffraction studies, we determine a G-type AFM structure with a propagation vector \textbf{k} = (1 1 0) and Cr$^{3+}$ spin directions along the crystallographic \emph{c} axis of the orthorhombic structure with space group \emph{Pnma} below $T_\textrm{N}$. At 12 K, the refined moment size is 2.45(6) $μ_\textrm{B}$, $\sim$ 82\% of the theoretical saturation value 3 $μ_\textrm{B}$. The Cr$^{3+}$ spin interactions are probably two-dimensional Ising like within the reciprocal (1 1 0) scattering plane. Below $T_\textrm{N}$, the lattice configuration (\emph{a}, \emph{b}, \emph{c}, and \emph{V}) deviates largely downward from the Gr$\ddot{\textrm{u}}$neisen law, displaying an anisotropic magnetostriction effect and a magnetoelastic effect. Especially, the sample contraction upon cooling is enhanced below the AFM transition temperature. There is evidence to suggest that the actual crystalline symmetry of YCrO$_3$ compound is probably lower than the currently assumed one. Additionally, we compared the $t_{2\textrm{g}}$ YCrO$_3$ and the $e_\textrm{g}$ La$_{7/8}$Sr$_{1/8}$MnO$_3$ single crystals for a further understanding of the reason for the possible symmetry lowering.

preprint2020arXiv

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based approach suffers from a cumbersome training process, while post-training quantization may lead to unacceptable accuracy drop. In this paper, we present an efficient and simple post-training method via scale optimization, named EasyQuant (EQ),that could obtain comparable accuracy with the training-based method.Specifically, we first alternately optimize scales of weights and activations for all layers target at convolutional outputs to further obtain the high quantization precision. Then, we lower down bit width to INT7 both for weights and activations, and adopt INT16 intermediate storage and integer Winograd convolution implementation to accelerate inference.Experimental results on various computer vision tasks show that EQ outperforms the TensorRT method and can achieve near INT8 accuracy in 7 bits width post-training.

preprint2020arXiv

Hyperspectral City V1.0 Dataset and Benchmark

This document introduces the background and the usage of the Hyperspectral City Dataset and the benchmark. The documentation first starts with the background and motivation of the dataset. Follow it, we briefly describe the method of collecting the dataset and the processing method from raw dataset to the final release dataset, specifically, the version 1.0. We also provide the detailed usage of the dataset and the evaluation metric for submitted the result for the 2019 Hyperspectral City Challenge.