Source author record

Wenbin Li

Wenbin Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Artificial Intelligence Machine Learning cond-mat.mes-hall eess.IV Robotics Computation and Language cond-mat.other cond-mat.soft Cryptography and Security Distributed, Parallel, and Cluster Computing Human-Computer Interaction math.NA Neural and Evolutionary Computing Numerical Analysis physics.app-ph physics.comp-ph

Catalog footprint

What is connected

41works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Ab initio study of carrier mobility in Bi$_2$O$_2$Se

Bi$_2$O$_2$Se is an emerging high-performance layered semiconductor with excellent stability. While experimental studies have explored carrier transport across various doping levels for both $n$-type and $p$-type conduction, a comprehensive theoretical understanding remains incomplete. In this work, we present parameter-free first-principles calculations of the electron and hole mobilities in Bi$_2$O$_2$Se, based on iterative solution of the Boltzmann transport equation that includes electron-phonon scattering and ionized impurity scattering on an equal footing. Intriguingly, we find that Bi$_2$O$_2$Se exhibits high electron mobilities in both the in-plane and out-of-plane directions, whereas the hole mobilities are only significant in the in-plane direction, displaying a unique three-dimensional (3D) electron transport and two-dimensional (2D) hole transport behavior. At 300~K, the calculated intrinsic electron and hole mobilities along the in-plane direction are 447~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$ and 29~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$, respectively, which are primarily affected by Fröhlich electron-phonon interactions. Due to its large static dielectric permittivity, Bi$_2$O$_2$Se exhibits an exceptionally high low-temperature electron mobilities above $1.0\times10^5~\mathrm{cm^2\,V^{-1}\,s^{-1}}$, and its electron mobilities above 50~K is robust against ionized impurity scattering over a wide range of impurity concentrations. By incorporating the Hall effect into our analysis, we predict an in-plane electron Hall mobility of 517~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$ at 300~K, in excellent agreement with experimental data. These results provide valuable insights into the carrier transport mechanisms in Bi$_2$O$_2$Se, and offer predictive benchmarks for future theoretical and experimental investigations.

preprint2026arXiv

AviationLMM: A Large Multimodal Foundation Model for Civil Aviation

Civil aviation is a cornerstone of global transportation and commerce, and ensuring its safety, efficiency and customer satisfaction is paramount. Yet conventional Artificial Intelligence (AI) solutions in aviation remain siloed and narrow, focusing on isolated tasks or single modalities. They struggle to integrate heterogeneous data such as voice communications, radar tracks, sensor streams and textual reports, which limits situational awareness, adaptability, and real-time decision support. This paper introduces the vision of AviationLMM, a Large Multimodal foundation Model for civil aviation, designed to unify the heterogeneous data streams of civil aviation and enable understanding, reasoning, generation and agentic applications. We firstly identify the gaps between existing AI solutions and requirements. Secondly, we describe the model architecture that ingests multimodal inputs such as air-ground voice, surveillance, on-board telemetry, video and structured texts, and performs cross-modal alignment and fusion, and produces flexible outputs ranging from situation summaries and risk alerts to predictive diagnostics and multimodal incident reconstructions. In order to fully realize this vision, we identify key research opportunities to address, including data acquisition, alignment and fusion, pretraining, reasoning, trustworthiness, privacy, robustness to missing modalities, and synthetic scenario generation. By articulating the design and challenges of AviationLMM, we aim to boost the civil aviation foundation model progress and catalyze coordinated research efforts toward an integrated, trustworthy and privacy-preserving aviation AI ecosystem.

preprint2026arXiv

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

preprint2026arXiv

DCVD: Dual-Channel Cross-Modal Fusion for Joint Vulnerability Detection and Localization

Software vulnerability detection plays a critical role in ensuring system security, where real-world auditing requires not only determining whether a function is vulnerable but also pinpointing the specific lines responsible. However, existing approaches either rely on a single information source -- sequential, structural, or semantic -- failing to jointly exploit the complementary strengths across modalities, or treat statement-level localization merely as a byproduct of function-level detection without explicit line-level supervision. To address these limitations, we propose DCVD (Dual-Channel Cross-Modal Vulnerability Detection), a unified framework that performs joint function-level detection and statement-level localization. DCVD extracts control-dependency and semantic features through two parallel branches and integrates them via contrastive alignment coupled with bidirectional cross-attention, effectively bridging the cross-modal representation gap. It further introduces explicit supervision signals at both the function and statement levels, enabling collaborative optimization across the two granularities. Extensive experiments on a large-scale real-world vulnerability benchmark demonstrate that DCVD consistently outperforms state-of-the-art methods on both function-level detection and statement-level localization. Our code is available at https://github.com/vinsontang1/DCVD.

preprint2026arXiv

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Vision foundation models (VFMs) have achieved strong performance across various vision tasks. However, it still remains challenging to apply VFMs for cross-domain few-shot segmentation (CD-FSS), which segments objects of novel classes under domain shifts using only a few labeled exemplars. The challenge is mainly driven by two factors: (1) limited labeled exemplars per novel class relative to the scale of VFM pre-training, making the model prone to overfitting during retraining, and (2) target-domain shifts underrepresented during pre-training, inducing cross-domain inconsistency and layer-wise sensitivity. To address these issues, we propose Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate VFM-based segmentation framework that learns effectively from limited labels and adapts to novel domains without source-data retraining. We first design Hierarchical Layer Selection (HLS) to adaptively identify the most informative VFM layer using a data-dependent Exemplar Transfer Risk (ETR) computed for each candidate layer. Then, Prior-Guided Regularization (PGR) regularizes interactions on the selected representation, yielding well-structured local signals for the subsequent stage. Furthermore, Pixelwise Adaptive Calibration (PAC) combines the selected representation with the refined interaction maps to calibrate pixel-wise predictions, producing consistent masks. Together, these stages form a hierarchical select-regularize-calibrate pipeline that guides frozen VFM features in new domains while fine-tuning less than 2.7% of parameters at test time. Extensive experiments show that HERA surpasses the state of the art by more than 4.1 mIoU across multiple CD-FSS benchmarks.

preprint2026arXiv

Training-Free Video Editing via Optical Flow-Enhanced Score Distillation

The rapid advancement in visual generation, particularly the emergence of pre-trained text-to-image and text-to-video models, has catalyzed growing interest in training-free video editing research. Mirroring training-free image editing techniques, current approaches preserve original video information through video input inversion and manipulating intermediate features and attention during the inference process to achieve content editing. Although they have demonstrated promising results, the lossy nature of the inversion process poses significant challenges in maintaining unedited regions of the video. Furthermore, feature and attention manipulation during inference can lead to unintended over-editing and face challenges in both local temporal continuity and global content consistency. To address these challenges, this study proposes a score distillation paradigm based on pre-trained text-to-video models, where the original video is iteratively optimized through multiple steps guided by editing gradients provided by score distillation to ultimately obtain the target video. The iterative optimization starting from the original video, combined with content preservation loss, ensures the maintenance of unedited regions in the original video and suppresses over-editing. To further guarantee video content consistency and temporal continuity, we additionally introduce a global consistency auxiliary loss and optical flow prediction-based local editing gradient smoothing. Experiments demonstrate that these strategies effectively address the aforementioned challenges, achieving comparable or superior performance across multiple dimensions including preservation of unedited regions, local temporal continuity, and global content consistency of editing results, compared to state-of-the-art methods.

preprint2025arXiv

Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining

Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains, which is often challenging due to the diverse characteristics of target domains and the limited availability of support data. Most CD-FSS methods redesign and retrain in-domain FSS models using abundant base data from the source domain, which are effective but costly to train. To address these issues, we propose adapting informative model structures of the well-trained FSS model for target domains by learning domain characteristics from few-shot labeled support samples during inference, thereby eliminating the need for source domain retraining. Specifically, we first adaptively identify domain-specific model structures by measuring parameter importance using a novel structure Fisher score in a data-dependent manner. Then, we progressively train the selected informative model structures with hierarchically constructed training samples, progressing from fewer to more support shots. The resulting Informative Structure Adaptation (ISA) method effectively addresses domain shifts and equips existing well-trained in-domain FSS models with flexible adaptation capabilities for new domains, eliminating the need to redesign or retrain CD-FSS models on base data. Extensive experiments validate the effectiveness of our method, demonstrating superior performance across multiple CD-FSS benchmarks. Codes are at https://github.com/fanq15/ISA.

preprint2025arXiv

FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing

Large vision-language models (VLMs) exhibit strong performance across various tasks. However, these VLMs encounter significant challenges when applied to the remote sensing domain due to the inherent differences between remote sensing images and natural images. Existing remote sensing VLMs often fail to extract fine-grained visual features and suffer from visual forgetting during deep language processing. To address this, we introduce MF-RSVLM, a Multi-Feature Fusion Remote Sensing Vision--Language Model that effectively extracts and fuses visual features for RS understanding. MF-RSVLM learns multi-scale visual representations and combines global context with local details, improving the capture of small and complex structures in RS scenes. A recurrent visual feature injection scheme ensures the language model remains grounded in visual evidence and reduces visual forgetting during generation. Extensive experiments on diverse RS benchmarks show that MF-RSVLM achieves state-of-the-art or highly competitive performance across remote sensing classification, image captioning, and VQA tasks. Our code is publicly available at https://github.com/Yunkaidang/RSVLM.

preprint2024arXiv

AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis

Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subjective decisions, uni-modal outputs, as well as privacy issues related to sensitive data. This paper introduces the idea of AccidentGPT, a foundation model of traffic accident analysis, which incorporates multi-modal input data to automatically reconstruct the accident process video with dynamics details, and furthermore provide multi-task analysis with multi-modal outputs. The design of the AccidentGPT is empowered with a multi-modality prompt with feedback for task-oriented adaptability, a hybrid training schema to leverage labelled and unlabelled data, and a edge-cloud split configuration for data privacy. To fully realize the functionalities of this model, we proposes several research opportunities. This paper serves as the stepping stone to fill the gaps in traditional approaches of traffic accident analysis and attract the research community attention for automatic, objective, and privacy-preserving traffic accident analysis.

preprint2022arXiv

A stochastic gradient descent approach with partitioned-truncated singular value decomposition for large-scale inverse problems of magnetic modulus data

We propose a stochastic gradient descent approach with partitioned-truncated singular value decomposition for large-scale inverse problems of magnetic modulus data. Motivated by a uniqueness theorem in gravity inverse problem and realizing the similarity between gravity and magnetic inverse problems, we propose to solve the level-set function modeling the volume susceptibility distribution from the nonlinear magnetic modulus data. To deal with large-scale data, we employ a mini-batch stochastic gradient descent approach with random reshuffling when solving the optimization problem of the inverse problem. We propose a stepsize rule for the stochastic gradient descent according to the Courant-Friedrichs-Lewy condition of the evolution equation. In addition, we develop a partitioned-truncated singular value decomposition algorithm for the linear part of the inverse problem in the context of stochastic gradient descent. Numerical examples illustrate the efficacy of the proposed method, which turns out to have the capability of efficiently processing large-scale measurement data for the magnetic inverse problem. A possible generalization to the inverse problem of deep neural network is discussed at the end.

preprint2022arXiv

Direct visualization of ultrafast lattice ordering triggered by an electron-hole plasma in 2D perovskites

Direct visualization of ultrafast coupling between charge carriers and lattice degrees of freedom in photo-excited semiconductors has remained a long-standing challenge and is critical for understanding the light-induced physical behavior of materials under extreme non-equilibrium conditions. Here, by monitoring the evolution of the wave-vector resolved ultrafast electron diffraction intensity following above-bandgap photo-excitation, we obtain a direct visual of the structural dynamics in monocrystalline 2D perovskites. Analysis reveals a surprising, light-induced ultrafast lattice ordering resulting from a strong interaction between hot-carriers and the perovskite lattice, which induces an in-plane octahedra rotation, towards a more symmetric phase. Correlated ultrafast spectroscopy performed at the same carrier density as ultrafast electron diffraction reveals that the creation of a hot and dense electron-hole plasma triggers lattice ordering at short timescales by modulating the crystal cohesive energy. Finally, we show that the interaction between the carrier gas and the lattice can be altered by tailoring the rigidity of the 2D perovskite by choosing the appropriate organic spacer layer.

preprint2022arXiv

Giant modulation of the electron mobility in semiconductor Bi$_2$O$_2$Se via incipient ferroelectric phase transition

High-mobility layered semiconductors have the potential to enable the next-generation electronics and computing. This paper demonstrates that the ultrahigh electron mobility observed in the layered semiconductor Bi$_2$O$_2$Se originates from an incipient ferroelectric transition that endows the material with a robust protection against mobility degradation by Coulomb scattering. Based on first-principles calculations of electron-phonon interaction and ionized impurity scattering, it is shown that the electron mobility of Bi$_2$O$_2$Se can reach 10$^4$ to 10$^6$ cm$^2$V$^{-1}$s$^{-1}$ over a wide range of realistic doping concentrations. Furthermore, a small elastic strain of 1.7% can drive the material toward a unique interlayer ferroelectric transition, resulting in a large increase in the dielectric permittivity and a giant enhancement of the low-temperature electron mobility by more than an order of magnitude. These results establish a new route to realize high-mobility layered semiconductors via phase and dielectric engineering.

preprint2022arXiv

Keeping Minimal Experience to Achieve Efficient Interpretable Policy Distillation

Although deep reinforcement learning has become a universal solution for complex control tasks, its real-world applicability is still limited because lacking security guarantees for policies. To address this problem, we propose Boundary Characterization via the Minimum Experience Retention (BCMER), an end-to-end Interpretable Policy Distillation (IPD) framework. Unlike previous IPD approaches, BCMER distinguishes the importance of experiences and keeps a minimal but critical experience pool with almost no loss of policy similarity. Specifically, the proposed BCMER contains two basic steps. Firstly, we propose a novel multidimensional hyperspheres intersection (MHI) approach to divide experience points into boundary points and internal points, and reserve the crucial boundary points. Secondly, we develop a nearest-neighbor-based model to generate robust and interpretable decision rules based on the boundary points. Extensive experiments show that the proposed BCMER is able to reduce the amount of experience to 1.4%~19.1% (when the count of the naive experiences is 10k) and maintain high IPD performance. In general, the proposed BCMER is more suitable for the experience storage limited regime because it discovers the critical experience and eliminates redundant experience.

preprint2022arXiv

LibFewShot: A Comprehensive Library for Few-shot Learning

Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or ``tricks'', such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. Moreover, different works may employ different software platforms, backbone architectures and input image sizes, making fair comparisons difficult and practitioners struggle with reproducibility. To address these situations, we propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing eighteen state-of-the-art few-shot learning methods in a unified framework with the same single codebase in PyTorch. Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks. In addition, with respect to the recent doubts on the necessity of meta- or episodic-training mechanism, our evaluation results confirm that such a mechanism is still necessary especially when combined with pre-training. We hope our work can not only lower the barriers for beginners to enter the area of few-shot learning but also elucidate the effects of nontrivial tricks to facilitate intrinsic research on few-shot learning. The source code is available from https://github.com/RL-VIG/LibFewShot.

preprint2022arXiv

Playing Lottery Tickets in Style Transfer Models

Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other tasks, the compression of style transfer models has been less explored. Recently, the lottery ticket hypothesis (LTH) has shown great potential in finding extremely sparse matching subnetworks which can achieve on par or even better performance than the original full networks when trained in isolation. In this work, we for the first time perform an empirical study to verify whether such trainable matching subnetworks also exist in style transfer models. Specifically, we take two most popular style transfer models, i.e., AdaIN and SANet, as the main testbeds, which represent global and local transformation based style transfer methods respectively. We carry out extensive experiments and comprehensive analysis, and draw the following conclusions. (1) Compared with fixing the VGG encoder, style transfer models can benefit more from training the whole network together. (2) Using iterative magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN and 73.7% sparsity in SANet, which demonstrates that style transfer models can play lottery tickets too. (3) The feature transformation module should also be pruned to obtain a much sparser model without affecting the existence and quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which shows that LTH can be generalized to various style transfer models.

preprint2022arXiv

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

In this paper, we mainly focus on the problem of how to learn additional feature representations for few-shot image classification through pretext tasks (e.g., rotation or color permutation and so on). This additional knowledge generated by pretext tasks can further improve the performance of few-shot learning (FSL) as it differs from human-annotated supervision (i.e., class labels of FSL tasks). To solve this problem, we present a plug-in Hierarchical Tree Structure-aware (HTS) method, which not only learns the relationship of FSL and pretext tasks, but more importantly, can adaptively select and aggregate feature representations generated by pretext tasks to maximize the performance of FSL tasks. A hierarchical tree constructing component and a gated selection aggregating component is introduced to construct the tree structure and find richer transferable knowledge that can rapidly adapt to novel classes with a few labeled images. Extensive experiments show that our HTS can significantly enhance multiple few-shot methods to achieve new state-of-the-art performance on four benchmark datasets. The code is available at: https://github.com/remiMZ/HTS-ECCV22.

preprint2020arXiv

Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Skin disease classification from images is crucial to dermatological diagnosis. However, identifying skin lesions involves a variety of aspects in terms of size, color, shape, and texture. To make matters worse, many categories only contain very few samples, posing great challenges to conventional machine learning algorithms and even human experts. Inspired by the recent success of Few-Shot Learning (FSL) in natural image classification, we propose to apply FSL to skin disease identification to address the extreme scarcity of training sample problem. However, directly applying FSL to this task does not work well in practice, and we find that the problem can be largely attributed to the incompatibility between Cross Entropy (CE) and episode training, which are both commonly used in FSL. Based on a detailed analysis, we propose the Query-Relative (QR) loss, which proves superior to CE under episode training and is closely related to recently proposed mutual information estimation. Moreover, we further strengthen the proposed QR loss with a novel adaptive hard margin strategy. Comprehensive experiments validate the effectiveness of the proposed FSL scheme and the possibility to diagnosis rare skin disease with a few labeled samples.

preprint2020arXiv

Asymmetric Distribution Measure for Few-shot Learning

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relations between query images and support classes. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of queries and classes. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, we achieve $3.02\%$ and $1.56\%$ gains over the state-of-the-art method on the $5$-way $1$-shot task, respectively, validating our innovative design of asymmetric distribution measures for few-shot learning.

preprint2020arXiv

Diversity Helps: Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation

Few-shot learning aims to learn a new concept when only a few training examples are available, which has been extensively explored in recent years. However, most of the current works heavily rely on a large-scale labeled auxiliary set to train their models in an episodic-training paradigm. Such a kind of supervised setting basically limits the widespread use of few-shot learning algorithms. Instead, in this paper, we develop a novel framework called Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation (ULDA), which pays attention to the distribution diversity inside each constructed pretext few-shot task when using data augmentation. Importantly, we highlight the value and importance of the distribution diversity in the augmentation-based pretext few-shot tasks, which can effectively alleviate the overfitting problem and make the few-shot model learn more robust feature representations. In ULDA, we systemically investigate the effects of different augmentation techniques and propose to strengthen the distribution diversity (or difference) between the query set and support set in each few-shot task, by augmenting these two sets diversely (i.e., distribution shifting). In this way, even incorporated with simple augmentation techniques (e.g., random crop, color jittering, or rotation), our ULDA can produce a significant improvement. In the experiments, few-shot models learned by ULDA can achieve superior generalization performance and obtain state-of-the-art results in a variety of established few-shot learning tasks on Omniglot and miniImageNet. The source code is available in https://github.com/WonderSeven/ULDA.

preprint2020arXiv

Embedded Deep Bilinear Interactive Information and Selective Fusion for Multi-view Learning

As a concrete application of multi-view learning, multi-view classification improves the traditional classification methods significantly by integrating various views optimally. Although most of the previous efforts have been demonstrated the superiority of multi-view learning, it can be further improved by comprehensively embedding more powerful cross-view interactive information and a more reliable multi-view fusion strategy in intensive studies. To fulfill this goal, we propose a novel multi-view learning framework to make the multi-view classification better aimed at the above-mentioned two aspects. That is, we seamlessly embed various intra-view information, cross-view multi-dimension bilinear interactive information, and a new view ensemble mechanism into a unified framework to make a decision via the optimization. In particular, we train different deep neural networks to learn various intra-view representations, and then dynamically learn multi-dimension bilinear interactive information from different bilinear similarities via the bilinear function between views. After that, we adaptively fuse the representations of multiple views by flexibly tuning the parameters of the view-weight, which not only avoids the trivial solution of weight but also provides a new way to select a few discriminative views that are beneficial to make a decision for the multi-view classification. Extensive experiments on six publicly available datasets demonstrate the effectiveness of the proposed method.

preprint2020arXiv

Experimental evidence of monolayer AlB$_2$ with symmetry-protected Dirac cones

Monolayer AlB$_2$ is composed of two atomic layers: honeycomb borophene and triangular aluminum. In contrast with the bulk phase, monolayer AlB$_2$ is predicted to be a superconductor with a high critical temperature. Here, we demonstrate that monolayer AlB$_2$ can be synthesized on Al(111) via molecular beam epitaxy. Our theoretical calculations revealed that the monolayer AlB$_2$ hosts several Dirac cones along the $Γ$--M and $Γ$--K directions; these Dirac cones are protected by crystal symmetries and are thus resistant to external perturbations. The extraordinary electronic structure of the monolayer AlB$_2$ was confirmed via angle-resolved photoemission spectroscopy measurements. These results are likely to stimulate further research interest to explore the exotic properties arising from the interplay of Dirac fermions and superconductivity in two-dimensional materials.

preprint2020arXiv

RGBD-Dog: Predicting Canine Pose from RGBD Sensors

The automatic extraction of animal \reb{3D} pose from images without markers is of interest in a range of scientific fields. Most work to date predicts animal pose from RGB images, based on 2D labelling of joint positions. However, due to the difficult nature of obtaining training data, no ground truth dataset of 3D animal motion is available to quantitatively evaluate these approaches. In addition, a lack of 3D animal pose data also makes it difficult to train 3D pose-prediction methods in a similar manner to the popular field of body-pose prediction. In our work, we focus on the problem of 3D canine pose estimation from RGBD images, recording a diverse range of dog breeds with several Microsoft Kinect v2s, simultaneously obtaining the 3D ground truth skeleton via a motion capture system. We generate a dataset of synthetic RGBD images from this data. A stacked hourglass network is trained to predict 3D joint locations, which is then constrained using prior models of shape and pose. We evaluate our model on both synthetic and real RGBD images and compare our results to previously published work fitting canine models to images. Finally, despite our training set consisting only of dog data, visual inspection implies that our network can produce good predictions for images of other quadrupeds -- e.g. horses or cats -- when their pose is similar to that contained in our training set.

preprint2020arXiv

Semantic Regularization: Improve Few-shot Image Classification by Reducing Meta Shift

Few-shot image classification requires the classifier to robustly cope with unseen classes even if there are only a few samples for each class. Recent advances benefit from the meta-learning process where episodic tasks are formed to train a model that can adapt to class change. However, these task sare independent to each other and existing works mainly rely on limited samples of individual support set in a single meta task. This strategy leads to severe meta shift issues across multiple tasks, meaning the learned prototypes or class descriptors are not stable as each task only involves their own support set. To avoid this problem, we propose a concise Semantic RegularizationNetwork to learn a common semantic space under the framework of meta-learning. In this space, all class descriptors can be regularized by the learned semantic basis, which can effectively solve the meta shift problem. The key is to train a class encoder and decoder structure that can encode the sample embedding features into the semantic domain with trained semantic basis, and generate a more stable and general class descriptor from the decoder. We evaluate our work by extensive comparisons with previous methods on three benchmark datasets (MiniImageNet, TieredImageNet, and CUB). The results show that the semantic regularization module improves performance by 4%-7% over the baseline method, and achieves competitive results over the current state-of-the-art models.

preprint2019arXiv

First-principles calculations of charge carrier mobility and conductivity in bulk semiconductors and two-dimensional materials

One of the fundamental properties of semiconductors is their ability to support highly tunable electric currents in the presence of electric fields or carrier concentration gradients. These properties are described by transport coefficients such as electron and hole mobilities. Recently, advances in electronic structure methods for real materials have made it possible to study these properties with predictive accuracy and without resorting to empirical parameters. Here, we review the most recent developments in the area of ab initio calculations of carrier mobilities of semiconductors. In the first part, we offer a brief historical overview of approaches to the calculation of carrier mobilities, and we establish the conceptual framework underlying modern ab initio approaches. We summarize the Boltzmann theory of carrier transport and we discuss its scope of applicability, merits, and limitations in the broader context of many-body Green's function approaches. We discuss recent implementations of the Boltzmann formalism within the context of density functional theory and many-body perturbation theory calculations, placing an emphasis on the key computational challenges and suggested solutions. In the second part, we discuss recent investigations of classic materials such as silicon, diamond, GaAs, GaN, Ga2O3, and lead halide perovskites as well as low-dimensional semiconductors such as graphene, silicene, phosphorene, MoS2, and InSe. We also review recent efforts toward high-throughput calculations of carrier transport. In the last part, we discuss the extension of the methodology to study spintronics and topological materials and we comment on the possibility of incorporating Berry-phase effects and many-body correlations beyond the standard Boltzmann formalism.

preprint2016arXiv

Blur Robust Optical Flow using Motion Channel

It is hard to estimate optical flow given a realworld video sequence with camera shake and other motion blur. In this paper, we first investigate the blur parameterization for video footage using near linear motion elements. we then combine a commercial 3D pose sensor with an RGB camera, in order to film video footage of interest together with the camera motion. We illustrates that this additional camera motion/trajectory channel can be embedded into a hybrid framework by interleaving an iterative blind deconvolution and warping based optical flow scheme. Our method yields improved accuracy within three other state-of-the-art baselines given our proposed ground truth blurry sequences; and several other realworld sequences filmed by our imaging system.

preprint2016arXiv

Dense Motion Estimation for Smoke

Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.

preprint2016arXiv

Drift Robust Non-rigid Optical Flow Enhancement for Long Sequences

It is hard to densely track a nonrigid object in long term, which is a fundamental research issue in the computer vision community. This task often relies on estimating pairwise correspondences between images over time where the error is accumulated and leads to a drift issue. In this paper, we introduce a novel optimization framework with an Anchor Patch constraint. It is supposed to significantly reduce overall errors given long sequences containing non-rigidly deformable objects. Our framework can be applied to any dense tracking algorithm, e.g. optical flow. We demonstrate the success of our approach by showing significant error reduction on 6 popular optical flow algorithms applied to a range of real-world nonrigid benchmarks. We also provide quantitative analysis of our approach given synthetic occlusions and image noise.

preprint2016arXiv

Nonrigid Optical Flow Ground Truth for Real-World Scenes with Time-Varying Shading Effects

In this paper we present a dense ground truth dataset of nonrigidly deforming real-world scenes. Our dataset contains both long and short video sequences, and enables the quantitatively evaluation for RGB based tracking and registration methods. To construct ground truth for the RGB sequences, we simultaneously capture Near-Infrared (NIR) image sequences where dense markers - visible only in NIR - represent ground truth positions. This allows for comparison with automatically tracked RGB positions and the formation of error metrics. Most previous datasets containing nonrigidly deforming sequences are based on synthetic data. Our capture protocol enables us to acquire real-world deforming objects with realistic photometric effects - such as blur and illumination change - as well as occlusion and complex deformations. A public evaluation website is constructed to allow for ranking of RGB image based optical flow and other dense tracking algorithms, with various statistical measures. Furthermore, we present an RGB-NIR multispectral optical flow model allowing for energy optimization by adoptively combining featured information from both the RGB and the complementary NIR channels. In our experiments we evaluate eight existing RGB based optical flow methods on our new dataset. We also evaluate our hybrid optical flow algorithm by comparing to two existing multispectral approaches, as well as varying our input channels across RGB, NIR and RGB-NIR.

preprint2016arXiv

OPML: A One-Pass Closed-Form Solution for Online Metric Learning

To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e., $O(d)$) and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.

preprint2016arXiv

To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction

Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel object and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an end-to-end approach that directly predicts stability and related quantities from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way bypassing the need for an explicit simulation. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities related to the potential fall of the towers. The evaluation is carried out on synthetic data and compared to human judgments on the same stimuli.

preprint2016arXiv

Towards the Design of Effective Freehand Gestural Interaction for Interactive TV

As interactive devices become pervasive, people are beginning to looking for more advanced interaction with televisions in the living room. Interactive television has the potential to offer a very engaging experience. But most common user tasks are still challenging with such systems, such as menu selection or text input. And little work has been done on understanding and sup-porting the effective design of freehand interaction with an TV in the living room. In this paper, we perform two studies investi-gating freehand gestural interaction with a consumer level sensor, which is suitable for TV scenarios. In the first study, we inves-tigate a range of design factors for tiled layout menu selection, including wearable feedback, push gesture depth, target size and position in motor space. The results show that tactile and audio feedback have no significant effect on performance and prefer-ence, and these results inform potential designs for high selection performance. In the second study, we investigate a common TV user task of text input using freehand gesture. We design and evaluate two virtual keyboard layouts and three freehand selec-tion methods. Results show that ease of use and error tolerance can be both achieved using a text entry method utilizing a dual circle layout and an expanding target selection technique. Finally, we propose design guidelines for effective, usable and com-fortable freehand gestural interaction for interactive TV based on the findings.

preprint2016arXiv

Video Interpolation using Optical Flow and Laplacian Smoothness

Non-rigid video interpolation is a common computer vision task. In this paper we present an optical flow approach which adopts a Laplacian Cotangent Mesh constraint to enhance the local smoothness. Similar to Li et al., our approach adopts a mesh to the image with a resolution up to one vertex per pixel and uses angle constraints to ensure sensible local deformations between image pairs. The Laplacian Mesh constraints are expressed wholly inside the optical flow optimization, and can be applied in a straightforward manner to a wide range of image tracking and registration problems. We evaluate our approach by testing on several benchmark datasets, including the Middlebury and Garg et al. datasets. In addition, we show application of our method for constructing 3D Morphable Facial Models from dynamic 3D data.

preprint2016arXiv

Visual Stability Prediction and Its Application to Manipulation

Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel objects and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an {\em end-to-end} approach that directly predicts stability from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way---bypassing the need for an explicit simulation at run-time. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities related to the potential fall of the towers. We first evaluate the approach on synthetic data and compared the results to human judgments on the same stimuli. Further, we extend this approach to reason about future states of such towers that in turn enables successful stacking.

preprint2015arXiv

Deformation-Driven Diffusion and Plastic Flow in Two-Dimensional Amorphous Granular Pillars

We report a combined experimental and simulation study of deformation-induced diffusion in compacted two-dimensional amorphous granular pillars, in which thermal fluctuations play negligible role. The pillars, consisting of bidisperse cylindrical acetal plastic particles standing upright on a substrate, are deformed uniaxially and quasistatically by a rigid bar moving at a constant speed. The plastic flow and particle rearrangements in the pillars are characterized by computing the best-fit affine transformation strain and non-affine displacement associated with each particle between two stages of deformation. The non-affine displacement exhibits exponential crossover from ballistic to diffusive behavior with respect to the cumulative deviatoric strain, indicating that in athermal granular packings, the cumulative deviatoric strain plays the role of time in thermal systems and drives effective particle diffusion. We further study the size-dependent deformation of the granular pillars by simulation, and find that different-sized pillars follow self-similar shape evolution during deformation. In addition, the yield stress of the pillars increases linearly with pillar size. Formation of transient shear lines in the pillars during deformation becomes more evident as pillar size increases. The width of these elementary shear bands is about twice the diameter of a particle, and does not vary with pillar size.

preprint2015arXiv

Experimental Realization of Two-Dimensional Boron Sheets

Boron is the fifth element in the periodic table and possesses rich chemistry second only to carbon. A striking feature of boron is that B12 icosahedral cages occur as the building blocks in bulk boron and many boron compounds. This is in contrast to its neighboring element, carbon, which prefers 2D layered structure (graphite) in its bulk form. On the other hand, boron clusters of medium size have been predicted to be planar or quasi-planar, such as B12+ , B13+, B19-, B36, and so on. This is also in contrast to carbon clusters which exhibit various cage structures (fullerenes). Therefore, boron and carbon can be viewed as a set of complementary chemical systems in their bulk and cluster structures. Now, with the boom of graphene, an intriguing question is that whether boron can also form a monoatomic-layer 2D sheet structure? Here, we report the first successful experimental realization of 2D boron sheets. We have revealed two types of boron sheet structures, corresponding to a triangular boron lattice with different arrangements of the hexagonal holes. Moreover, our boron sheets were found to be relatively stable against oxidization, and interacts only weekly with the substrate. The realization of such a long expected 2D boron sheet could open a door toward boron electronics, in analogous to the carbon electronics based on graphene.

preprint2015arXiv

Giant Piezoelectricity in Monolayer Group IV Monochalcogenides: SnSe, SnS, GeSe and GeS

We predict enormous piezoelectric effects in intrinsic monolayer group IV monochalcogenides (MX, M=Sn or Ge, X=Se or S), including SnSe, SnS, GeSe and GeS. Using first-principle simulations based on the modern theory of polarization, we find that their characteristic piezoelectric coefficients are about two orders of magnitude larger than those of other 2D materials, such as MoS2 and GaSe, and bulk quartz and AlN which are widely used in industry. This enhancement is a result of the unique "puckered" C2v symmetry and weaker chemical bonds of monolayer group IV monochalcogenides. Given the achieved experimental advances in fabrication of monolayers, their flexible character and ability to withstand enormous strain, these 2D structures with giant piezoelectric effects may be promising for a broad range of applications, such as nano-sized sensors, piezotronics, and energy harvesting in portable electronic devices.

preprint2015arXiv

Piezoelectricity in Two-Dimensional Group III Monochalcogenides

We find that several layer-phase group-III monochalcogenides, including GaS, GaSe and InSe, are piezoelectric in the monolayer form. First-principles calculations reveal that the piezoelectric coefficients of monolayer GaS, GaSe and InSe are on the same order of magnitude as the earlier discovered two-dimensional piezoelectric materials, such as BN and MoS2 monolayers. Our study expands the family of two dimensional piezoelectric materials, suggesting that strong piezoelectric response can occur in a wide range of two dimensional materials with broken inversion symmetry. The co-existence of piezoelectricity and superior photo-sensitivity in these two-dimensional semiconductors enables the integration of electromechanical and optical sensors on the same material platform.

preprint2015arXiv

Variable Coupling Strength of Silicene on Ag(111)

We performed a scanning tunneling microscopy and spectroscopy (STM/STS) study on the electronic structures of of root(3)Xroot(3)-silicene on Ag(111). We find that the coupling strength of root(3)Xroot(3)-silicene with Ag(111) substrate is variable at different regions, giving rise to notable effects in experiments. These evidences of decoupling or variable interaction of silicene with the substrate are helpful to in-depth understanding of the structure and electronic properties of silicene.

preprint2014arXiv

Envelope function method for electrons in slowly-varying inhomogeneously deformed crystals

We develop a new envelope-function formalism to describe electrons in slowly-varying inhomogeneously strained semiconductor crystals. A coordinate transformation is used to map a deformed crystal back to geometrically undeformed structure with deformed crystal potential. The single-particle Schrödinger equation is solved in the undeformed coordinates using envelope function expansion, wherein electronic wavefunctions are written in terms of strain-parametrized Bloch functions modulated by slowly varying envelope functions. Adopting local approximation of electronic structure, the unknown crystal potential in Schrödinger equation can be replaced by the strain-parametrized Bloch functions and the associated strain-parametrized energy eigenvalues, which can be constructed from unit-cell level ab initio or semi-empirical calculations of homogeneously deformed crystals at a chosen crystal momentum. The Schrödinger equation is then transformed into a coupled differential equation for the envelope functions and solved as a generalized matrix eigenvector problem. As the envelope functions are slowly varying, coarse spatial or Fourier grid can be used to represent the envelope functions, enabling the method to treat relatively large systems. We demonstrate the effectiveness of this method using a one-dimensional model, where we show that the method can achieve high accuracy in the calculation of energy eigenstates with relatively low cost compared to direct diagonalization of Hamiltonian. We further derive envelope function equations that allow the method to be used empirically, in which case certain parameters in the envelope function equations will be fitted to experimental data.

preprint2014arXiv

Learning Multi-Scale Representations for Material Classification

The recent progress in sparse coding and deep learning has made unsupervised feature learning methods a strong competitor to hand-crafted descriptors. In computer vision, success stories of learned features have been predominantly reported for object recognition tasks. In this paper, we investigate if and how feature learning can be used for material recognition. We propose two strategies to incorporate scale information into the learning procedure resulting in a novel multi-scale coding procedure. Our results show that our learned features for material recognition outperform hand-crafted descriptors on the FMD and the KTH-TIPS2 material classification benchmarks.

preprint2014arXiv

Persistent Dirac Fermion State on Bulk-like Si(111) Surface

The "multilayer silicene" films were grown on Ag(111), with increasing thickness above 30 monolayers (ML). We found that the "multilayer silicene" is indeed a bulk Si(111) film. Such Si film on Ag(111) always exhibits a root(3)xroot(3) honeycomb superstructure on surface. Delocalized surface state as well as linear energy-momentum dispersion was revealed by quasiparticle interference patterns (QPI) on the surface, which proves the existence of Dirac fermions state. Our results indicate that bulk silicon with diamond structure can also host Dirac fermions, which makes the system even more attractive for further applications compared with monolayer silicene.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2504.21414:author:6:wenbin-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24022:author:9:wenbin-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.11015:author:2:wenbin-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.19340:author:3:wenbin-li

Imported May 20, 2026Synced May 20, 2026

8 works

Yang Gao

Researcher

Yang Gao contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Darren Cosker

Researcher

Darren Cosker contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Jing Huo

Researcher

Jing Huo contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Baojie Feng

Researcher

Baojie Feng contributes to research discovery and scholarly infrastructure.

Open to collaborate

Wenbin Li

What is connected

Connect this record

See the researcher in context

Building this map preview

41 published item(s)

Ab initio study of carrier mobility in Bi$_2$O$_2$Se

AviationLMM: A Large Multimodal Foundation Model for Civil Aviation

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

DCVD: Dual-Channel Cross-Modal Fusion for Joint Vulnerability Detection and Localization

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Training-Free Video Editing via Optical Flow-Enhanced Score Distillation

Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining

FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing

AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis

A stochastic gradient descent approach with partitioned-truncated singular value decomposition for large-scale inverse problems of magnetic modulus data

Direct visualization of ultrafast lattice ordering triggered by an electron-hole plasma in 2D perovskites

Giant modulation of the electron mobility in semiconductor Bi$_2$O$_2$Se via incipient ferroelectric phase transition

Keeping Minimal Experience to Achieve Efficient Interpretable Policy Distillation

LibFewShot: A Comprehensive Library for Few-shot Learning

Playing Lottery Tickets in Style Transfer Models

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Asymmetric Distribution Measure for Few-shot Learning

Diversity Helps: Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation

Embedded Deep Bilinear Interactive Information and Selective Fusion for Multi-view Learning

Experimental evidence of monolayer AlB$_2$ with symmetry-protected Dirac cones

RGBD-Dog: Predicting Canine Pose from RGBD Sensors

Semantic Regularization: Improve Few-shot Image Classification by Reducing Meta Shift

First-principles calculations of charge carrier mobility and conductivity in bulk semiconductors and two-dimensional materials

Blur Robust Optical Flow using Motion Channel

Dense Motion Estimation for Smoke

Drift Robust Non-rigid Optical Flow Enhancement for Long Sequences

Nonrigid Optical Flow Ground Truth for Real-World Scenes with Time-Varying Shading Effects

OPML: A One-Pass Closed-Form Solution for Online Metric Learning

To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction

Towards the Design of Effective Freehand Gestural Interaction for Interactive TV

Video Interpolation using Optical Flow and Laplacian Smoothness

Visual Stability Prediction and Its Application to Manipulation

Deformation-Driven Diffusion and Plastic Flow in Two-Dimensional Amorphous Granular Pillars

Experimental Realization of Two-Dimensional Boron Sheets

Giant Piezoelectricity in Monolayer Group IV Monochalcogenides: SnSe, SnS, GeSe and GeS

Piezoelectricity in Two-Dimensional Group III Monochalcogenides

Variable Coupling Strength of Silicene on Ag(111)

Envelope function method for electrons in slowly-varying inhomogeneously deformed crystals

Learning Multi-Scale Representations for Material Classification

Persistent Dirac Fermion State on Bulk-like Si(111) Surface