Source author record

Lin Wu

Lin Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning cond-mat.mtrl-sci physics.optics quant-ph Artificial Intelligence Computation and Language physics.app-ph

Catalog footprint

What is connected

18works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model

Prompt learning has become an effective and widely used technique in enhancing vision-language models (VLMs) such as CLIP for various downstream tasks, particularly in zero-shot classification within specific domains. Existing methods typically focus on either learning class-shared prompts for a given domain or generating instance-specific prompts through conditional prompt learning. While these methods have achieved promising performance, they often overlook class-specific knowledge in prompt design, leading to suboptimal outcomes. The underlying reasons are: 1) class-specific prompts offer more fine-grained supervision compared to coarse class-shared prompts, which helps prevent misclassification of data from different classes into a single class; 2) compared to class-specific prompts, instance-specific prompts neglect the richer class-level information across multiple instances, potentially causing data from the same class to be divided into multiple classes. To effectively supplement the class-specific knowledge into existing methods, we propose a plug-and-play Class-Aware Knowledge Injection (CAKI) framework. CAKI comprises two key components, i.e., class-specific prompt generation and query-key prompt matching. The former encodes class-specific knowledge into prompts from few-shot samples that belong to the same class and stores the learned prompts in a class-level knowledge bank. The latter provides a plug-and-play mechanism for each test instance to retrieve relevant class-level knowledge from the knowledge bank and inject such knowledge to refine model predictions. Extensive experiments demonstrate that our CAKI effectively improves the performance of existing methods on base and novel classes. Code is publicly available at \href{https://github.com/yjh576/CAKI}{this https URL}.

preprint2022arXiv

Designing light-element materials with large effective spin-orbit coupling

Spin-orbit coupling (SOC), the core of numerous condensed-matter phenomena such as nontrivial band gap, magnetocrystalline anisotropy, etc, is generally considered to be appreciable only in heavy elements, detrimental to the synthetization and application of functional materials. Therefore, amplifying the SOC effect in light elements is of great importance. Here, focusing on 3d and 4d systems, we demonstrate that the interplay between crystal symmetry and electron correlation can dramatically enhance the SOC effect in certain partially occupied orbital multiplets, through the self-consistently reinforced orbital polarization as a pivot. We then provide design principles and comprehensive databases, in which we list all the Wyckoff positions and site symmetries, in all two-dimensional (2D) and three-dimensional crystals that potentially have such enhanced SOC effect. As an important demonstration, we predict nine material candidates from our selected 2D material pool as high-temperature quantum anomalous Hall insulators with large nontrivial band gaps of hundreds of meV. Our work provides an efficient and straightforward way to predict promising SOC-active materials, releasing the burden of requiring heavy elements for next-generation spin-orbitronic materials and devices.

preprint2022arXiv

Exploring variational quantum eigensolver ansatzes for the long-range XY model

Finding the ground state energy and wavefunction of a quantum many-body system is a key problem in quantum physics and chemistry. We study this problem for the long-range XY model by using the variational quantum eigensolver (VQE) algorithm. We consider VQE ansatzes with full and linear entanglement structures consisting of different building gates: the CNOT gate, the controlled-rotation (CRX) gate, and the two-qubit rotation (TQR) gate. We find that the full-entanglement CRX and TQR ansatzes can sufficiently describe the ground state energy of the long-range XY model. In contrast, only the full-entanglement TQR ansatz can represent the ground state wavefunction with a fidelity close to one. In addition, we find that instead of using full-entanglement ansatzes, restricted-entanglement ansatzes where entangling gates are applied only between qubits that are a fixed distance from each other already suffice to give acceptable solutions. Using the entanglement entropy to characterize the expressive powers of the VQE ansatzes, we show that the full-entanglement TQR ansatz has the highest expressive power among them.

preprint2022arXiv

Fiber spectrum analyzer based on planar waveguide array aligned to a camera without lens

We propose and experimentally demonstrate a fiber spectrum analyzer based on a planar waveguide chip butt-coupled with an input fiber and aligned to a standard camera without any free-space optical elements. The chip consists of a single-mode waveguide to connect with the fiber, a beam broadening area, and a waveguide array in which the lengths of the waveguides are designed for both wavelength separation and beam focusing. The facet of the chip is diced open so that the outputs of the array form a near-field emitter. The far field are calculated by the Rayleigh-Sommerfeld diffraction integral. We show that the chip can provide a focal depth on the millimeter scale, allowing relaxed alignment to the camera without any fine-positioning stage. Two devices with 120 and 220 waveguides are fabricated on the polymer waveguide platform. The measured spectral width are 0.63 nm and 0.42 nm, respectively. This simple and practical approach may lead to the development of a spectrum analyzer for fiber that is easily mountable to any commercial camera, thereby avoiding the complication for customized detectors as well as electronic circuits afterwards.

preprint2022arXiv

Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification

The cross-resolution person re-identification (CRReID) problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images. It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. To address this problem, state-of-the-art (SOTA) solutions either learn the resolution-invariant representation or adopt super-resolution (SR) module to recover the missing information from the LR query. This paper explores an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric, which is adaptive to the resolution of a query image. We realize this idea by learning resolution-adaptive representations for cross-resolution comparison. Specifically, we propose two resolution-adaptive mechanisms. The first one disentangles the resolution-specific information into different sub-vectors in the penultimate layer of the deep neural networks, and thus creates a varying-length representation. To better extract resolution-dependent information, we further propose to learn resolution-adaptive masks for intermediate residual feature blocks. A novel progressive learning strategy is proposed to train those masks properly. These two mechanisms are combined to boost the performance of CRReID. Experimental results show that the proposed method is superior to existing approaches and achieves SOTA performance on multiple CRReID benchmarks.

preprint2022arXiv

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Visual place recognition is one of the essential and challenging problems in the fields of robotics. In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments. We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation and recover the static image directly from the corresponding dynamic image. We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors. In parallel, the static image is encoded using the popular Bag-of-words model. On the basis of the above multi-modal features, we finally measure the similarity between the query image and target landmark by the joint similarity of their semantic and visual codes. Extensive experiments demonstrate the effectiveness and robustness of the proposed approach for place recognition in dynamic environments.

preprint2022arXiv

Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts.

preprint2020arXiv

Exhaustive List of Topological Hourglass Band Crossings in 230 Space Groups

Topological semimetals with band crossings (BCs) near the Fermi level have attracted intense research activities in the past several years. Among various BCs, those enforced by an hourglass-like connectivity pattern, which are just located at the vertex in the neck of an hourglass and thus called hourglass BCs (HBCs), show interesting topological properties and are intimately related with the space group symmetry. Through checking compatibility relations in the Brillouin zone (BZ), we list all possible HBCs for all 230 space groups by identifying positions of HBCs as well as the compatibility relations related with the HBCs.The HBCs can be coexisting with conventional topological BCs such as Dirac andWeyl fermions and based on our exhaustive list, the dimensionality and degeneracy of the HBCs can be quickly identified. It is also found that the HBCs can be classified into two categories: one contains essential HBCs which are guaranteed to exist, while the HBCs in the other category may be tuned to disappear. Our results can help in efficiently predicting hourglass semimetals combined with first-principles calculations as well as studying transitions among various topological crystalline phases.

preprint2020arXiv

Reconfigurable photon sources based on quantum plexcitonic systems

A single photon in a strongly nonlinear cavity is able to block the transmission of the second photon, thereby converting incident coherent light into anti-bunched light, which is known as photon blockade effect. On the other hand, photon anti-pairing, where only the entry of two photons is blocked and the emission of bunches of three or more photons is allowed, is based on an unconventional photon blockade mechanism due to destructive interference of two distinct excitation pathways. We propose quantum plexcitonic systems with moderate nonlinearity to generate both anti-bunched and anti-paired photons. The proposed plexitonic systems benefit from subwavelength field localizations that make quantum emitters spatially distinguishable, thus enabling a reconfigurable photon source between anti-bunched and anti-paired states via tailoring the energy bands. For a realistic nanoprism plexitonic system, two schemes of reconfiguration are suggested: (i) the chemical means by partially changing the type of the emitters; or (ii) the optical approach by rotating the polarization angle of the incident light to tune the coupling rate of the emitters. These results pave the way to realize reconfigurable nonclassical photon sources in a simple quantum plexcitonic platform with readily accessible experimental conditions.

preprint2020arXiv

Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training. Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment. However, aligning the marginal feature distributions does not guarantee the alignment of class conditional distributions. This limitation is more evident when adapting object detectors as the domain discrepancy is larger compared to the image classification task, e.g. various number of objects exist in one image and the majority of content in an image is the background. This motivates us to learn domain invariance for category level semantics via gradient alignment. Intuitively, if the gradients of two domains point in similar directions, then the learning of one domain can improve that of another domain. To achieve gradient alignment, we propose Forward-Backward Cyclic Adaptation, which iteratively computes adaptation from source to target via backward hopping and from target to source via forward passing. In addition, we align low-level features for adapting holistic color/texture via adversarial training. However, the detector performs well on both domains is not ideal for target domain. As such, in each cycle, domain diversity is enforced by maximum entropy regularization on the source domain to penalize confident source-specific learning and minimum entropy regularization on target domain to intrigue target-specific learning. Theoretical analysis of the training process is provided, and extensive experiments on challenging cross-domain object detection datasets have shown the superiority of our approach over the state-of-the-art.

preprint2019arXiv

CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels

This work tackles the problem of generating a medical report for multi-image panels. We apply our solution to the Renal Direct Immunofluorescence (RDIF) assay which requires a pathologist to generate a report based on observations across the eight different WSI in concert with existing clinical features. To this end, we propose a novel attention-based multi-modal generative recurrent neural network (RNN) architecture capable of dynamically sampling image data concurrently across the RDIF panel. The proposed methodology incorporates text from the clinical notes of the requesting physician to regulate the output of the network to align with the overall clinical context. In addition, we found the importance of regularizing the attention weights for word generation processes. This is because the system can ignore the attention mechanism by assigning equal weights for all members. Thus, we propose two regularizations which force the system to utilize the attention mechanism. Experiments on our novel collection of RDIF WSIs provided by a large clinical laboratory demonstrate that our framework offers significant improvements over existing methods.

preprint2019arXiv

Medi-Care AI: Predicting Medications From Billing Codes via Robust Recurrent Neural Networks

In this paper, we present an effective deep prediction framework based on robust recurrent neural networks (RNNs) to predict the likely therapeutic classes of medications a patient is taking, given a sequence of diagnostic billing codes in their record. Accurately capturing the list of medications currently taken by a given patient is extremely challenging due to undefined errors and omissions. We present a general robust framework that explicitly models the possible contamination through overtime decay mechanism on the input billing codes and noise injection into the recurrent hidden states, respectively. By doing this, billing codes are reformulated into its temporal patterns with decay rates on each medical variable, and the hidden states of RNNs are regularised by random noises which serve as dropout to improved RNNs robustness towards data variability in terms of missing values and multiple errors. The proposed method is extensively evaluated on real health care data to demonstrate its effectiveness in suggesting medication orders from contaminated values.

preprint2016arXiv

Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification

Person re-identification is to seek a correct match for a person of interest across views among a large number of imposters. It typically involves two procedures of non-linear feature extractions against dramatic appearance changes, and subsequent discriminative analysis in order to reduce intra- personal variations while enlarging inter-personal differences. In this paper, we introduce a hybrid architecture which combines Fisher vectors and deep neural networks to learn non-linear representations of person images to a space where data can be linearly separable. We reinforce a Linear Discriminant Analysis (LDA) on top of the deep neural network such that linearly separable latent representations can be learnt in an end-to-end fashion. By optimizing an objective function modified from LDA, the network is enforced to produce feature distributions which have a low variance within the same class and high variance between classes. The objective is essentially derived from the general LDA eigenvalue problem and allows to train the network with stochastic gradient descent and back-propagate LDA gradients to compute the gradients involved in Fisher vector encoding. For evaluation we test our approach on four benchmark data sets in person re-identification (VIPeR [1], CUHK03 [2], CUHK01 [3], and Market1501 [4]). Extensive experiments on these benchmarks show that our model can achieve state-of-the-art results.

preprint2016arXiv

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.

preprint2016arXiv

PersonNet: Person Re-identification with Deep Convolutional Neural Networks

In this paper, we propose a deep end-to-end neu- ral network to simultaneously learn high-level features and a corresponding similarity metric for person re-identification. The network takes a pair of raw RGB images as input, and outputs a similarity value indicating whether the two input images depict the same person. A layer of computing neighborhood range differences across two input images is employed to capture local relationship between patches. This operation is to seek a robust feature from input images. By increasing the depth to 10 weight layers and using very small (3$\times$3) convolution filters, our architecture achieves a remarkable improvement on the prior-art configurations. Meanwhile, an adaptive Root- Mean-Square (RMSProp) gradient decent algorithm is integrated into our architecture, which is beneficial to deep nets. Our method consistently outperforms state-of-the-art on two large datasets (CUHK03 and Market-1501), and a medium-sized data set (CUHK01).

preprint2016arXiv

Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; 3) they often incur cumbersome training model caused by the neighborhood graph construction using all $N$ points in the database ($O(N)$). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized \footnote{We use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix.} hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.

preprint2016arXiv

Structured learning of metric ensembles with application to person re-identification

Matching individuals across non-overlapping camera networks, known as person re-identification, is a fundamentally challenging problem due to the large visual appearance changes caused by variations of viewpoints, lighting, and occlusion. Approaches in literature can be categoried into two streams: The first stream is to develop reliable features against realistic conditions by combining several visual features in a pre-defined way; the second stream is to learn a metric from training data to ensure strong inter-class differences and intra-class similarities. However, seeking an optimal combination of visual features which is generic yet adaptive to different benchmarks is a unsoved problem, and metric learning models easily get over-fitted due to the scarcity of training data in person re-identification. In this paper, we propose two effective structured learning based approaches which explore the adaptive effects of visual features in recognizing persons in different benchmark data sets. Our framework is built on the basis of multiple low-level visual features with an optimal ensemble of their metrics. We formulate two optimization algorithms, CMCtriplet and CMCstruct, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve.

preprint2010arXiv

Detecting Image Forgeries using Geometric Cues

This chapter presents a framework for detecting fake regions by using various methods including watermarking technique and blind approaches. In particular, we describe current categories on blind approaches which can be divided into five: pixel-based techniques, format-based techniques, camera-based techniques, physically-based techniques and geometric-based techniques. Then we take a second look on the geometric-based techniques and further categorize them in detail. In the following section, the state-of-the-art methods involved in the geometric technique are elaborated.

Lin Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model

Designing light-element materials with large effective spin-orbit coupling

Exploring variational quantum eigensolver ansatzes for the long-range XY model

Fiber spectrum analyzer based on planar waveguide array aligned to a camera without lens

Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

Exhaustive List of Topological Hourglass Band Crossings in 230 Space Groups

Reconfigurable photon sources based on quantum plexcitonic systems

Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels

Medi-Care AI: Predicting Medications From Billing Codes via Robust Recurrent Neural Networks

Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

PersonNet: Person Re-identification with Deep Convolutional Neural Networks

Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Structured learning of metric ensembles with application to person re-identification

Detecting Image Forgeries using Geometric Cues