Source author record

Rui Cai

Rui Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall Machine Learning physics.app-ph Multimedia

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ModelLens: Finding the Best for Your Task from Myriads of Models

The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior records on either side. Existing approaches handle only fragments of this in-the-wild setting: AutoML and transferability estimation select models from small predefined pools or require expensive per-model forward passes on the target dataset, while model routing presupposes a given candidate pool. We introduce ModelLens, a unified framework for model recommendation in the wild. Our key insight is that public leaderboard interactions, though scattered and noisy, collectively trace out an implicit atlas of model capabilities across heterogeneous evaluation settings, a signal rich enough to learn from directly. By learning a performance-aware latent space over model--dataset--metric tuples, ModelLens ranks unseen models on unseen datasets without running candidates on the target dataset. On a new benchmark of 1.62M evaluation records spanning 47K models and 9.6K datasets, ModelLens surpasses baselines that either rely on metadata alone or require running each candidate on the target dataset. Its recommended Top-K pools further improve multiple representative routing methods by up to 81% across diverse QA benchmarks. Case studies on recently released benchmarks further confirm generalization to both text and vision-language tasks.

preprint2022arXiv

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Despite the recent developments in the field of cross-modal retrieval, there has been less research focusing on low-resource languages due to the lack of manually annotated datasets. In this paper, we propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages. To this end, we use Machine Translation (MT) to construct pseudo-parallel sentence pairs for low-resource languages. However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we introduce a multi-view self-distillation method to learn noise-robust target-language representations, which employs a cross-attention module to generate soft pseudo-targets to provide direct supervision from the similarity-based view and feature-based view. Besides, inspired by the back-translation in unsupervised MT, we minimize the semantic discrepancies between origin sentences and back-translated sentences to further improve the noise robustness of the textual encoder. Extensive experiments are conducted on three video-text and image-text cross-modal retrieval benchmarks across different languages, and the results demonstrate that our method significantly improves the overall performance without using extra human-labeled data. In addition, equipped with a pre-trained visual encoder from a recent vision-and-language pre-training framework, i.e., CLIP, our model achieves a significant performance gain, showing that our method is compatible with popular pre-training models. Code and data are available at https://github.com/HuiGuanLab/nrccr.

preprint2020arXiv

Distinguishing between dynamical and static Rashba effects in hybrid perovskite nanocrystals using transient absorption spectroscopy

The dynamical and static Rashba effects in hybrid methylammonium (MA) lead halide perovskites have recently been theoretically predicted. However, only the static effect was experimentally confirmed so far. Here we report on the dynamical Rashba effect observed using snapshot transient absorption spectral imaging with 400 nm pumping for a fully encapsulated film of 20-nm-sized 3D MAPbBr3 nanocrystals. The effect causes a 240 meV splitting of the lowest-energy absorption bleaching band, initially appearing over sub-ps timescale and progressively stabilizing to 60 meV during 500 ps. The integrated intensities of the split subbands demonstrate a photon-helicity-dependent asymmetry, thus proving the Rashba-type splitting and providing direct experimental evidence for the Rashba spin-split edge states in lead halide perovskite materials. The ultrafast dynamics is governed by the relaxation of two-photon-excited electrons in the Rashba spin-split system caused by a built-in electric field originating from dynamical charge separation in the entire MAPbBr3 nanocrystal.

preprint2020arXiv

Structural phase transitions and photoluminescence mechanism in a layer of 3D hybrid perovskite nanocrystals

Although the structural phase transitions in single-crystal hybrid methyl-ammonium (MA) lead halide perovskites (MAPbX3, X = Cl, Br, I) are common phenomena, they have never been observed in the corresponding nanocrystals. Here we demonstrate that two-photon-excited photoluminescence (PL) spectroscopy is capable of monitoring the structural phase transitions in MAPbX3 nanocrystals because nonlinear susceptibilities govern the light absorption rates. We provide experimental evidence that the orthorhombic-to-tetragonal structural phase transition in a single layer of 20-nm-sized 3D MAPbBr3 nanocrystals is spread out within the 70 - 140 K range. This structural phase instability range arises because, unlike in single-crystal MAPbX3, free rotations of MA ions in the corresponding nanocrystals are no longer restricted by a long-range MA dipole order. The resulting configurational entropy loss can be even enhanced by the interfacial electric field arising due to charge separation at the MAPbBr3/ZnO heterointerface, extending the orthorhombic-to-tetragonal structural phase instability range from 70 to 230 K. We conclude that the weak sensitivity of conventional one-photon-excited PL spectroscopy to the structural phase transitions in 3D MAPbX3 nanocrystals results from the structural phase instability providing negligible distortions of PbX6 octahedra. In contrast, the intensity of two-photon-excited PL and electric-field-induced one-photon-excited PL still remains sensitive enough to weak structural distortions due to the higher rank tensor nature of nonlinear susceptibilities involved. We also show that room-temperature PL originates from the radiative recombination of the optical-phonon vibrationally excited polaronic quasiparticles with energies might exceed the ground-state Frohlich polaron and Rashba energies due to optical-phonon bottleneck.

preprint2013arXiv

Regularized Discriminant Embedding for Visual Descriptor Learning

Images can vary according to changes in viewpoint, resolution, noise, and illumination. In this paper, we aim to learn representations for an image, which are robust to wide changes in such environmental conditions, using training pairs of matching and non-matching local image patches that are collected under various environmental conditions. We present a regularized discriminant analysis that emphasizes two challenging categories among the given training pairs: (1) matching, but far apart pairs and (2) non-matching, but close pairs in the original feature space (e.g., SIFT feature space). Compared to existing work on metric learning and discriminant analysis, our method can better distinguish relevant images from irrelevant, but look-alike images.