Source author record

Lin Sun

Lin Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.quant-gas cond-mat.str-el cond-mat.supr-con Artificial Intelligence Cryptography and Security Computation Computation and Language Machine Learning math.ST Methodology Robotics Statistics Theory

Catalog footprint

What is connected

16works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream RAG effectiveness under real-world conditions. We introduce an OCR benchmark for industrial RAG systems covering 11 challenging document types, including extreme layouts, high-resolution pages, complex or watermarked backgrounds, historical documents with non-standard reading orders, visually decorated text, and documents containing tables and mathematical formulas. Evaluating recent SOTA OCR models under a controlled OCR-first RAG pipeline shows clear performance degradation on realistic industrial documents despite strong conventional benchmark scores. We find that high OCR accuracy does not necessarily translate into strong downstream RAG performance: structural and semantic errors can cause substantial retrieval failures even when WER/CER remains low. Further analysis shows that this mismatch is category-dependent, arises through both retrieval-side and downstream generation-side failures, and remains stable across representative OCR-first pipeline choices. The benchmark is publicly available at https://github.com/Qihoo360/InduOCRBench.

preprint2024arXiv

BCS-BEC crossover in atomic Fermi gases in quasi-two-dimensional Lieb lattices: Effects of flat band and finite temperature

We investigate the finite-temperature superfluid behavior of ultracold atomic Fermi gases in quasi-two-dimensional Lieb lattices with a short-range attractive interaction, using a pairing fluctuation theory within the BCS-BEC crossover framework. We find that the presence of a flat band, along with van Hove singularities, leads to exotic quantum phenomena. As the Fermi level enters the flat band, both the gap and the superfluid transition temperature $T_c$ as a function of interaction change from a conventional exponential behavior into an unusual power law, and the evolution of superfluid densities with temperature also follows a power law even at weak interactions. The quantum geometric effects, manifested by an enhanced effective pair hopping integral, may contribute significantly to both $T_c$ and the superfluidities. As the chemical potential crosses the van Hove singularities in the weak interaction regime, the nature of pairing changes between particle-like and hole-like. A pair density wave state emerges at high densities with a relatively strong interaction strength.

preprint2024arXiv

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases. However, current MIE methods often resort to using task-specific model structures, which results in limited generalizability across tasks and underutilizes shared knowledge across MIE tasks. To address these issues, we propose UMIE, a unified multimodal information extractor to unify three MIE tasks as a generation problem using instruction tuning, being able to effectively extract both textual and visual mentions. Extensive experiments show that our single UMIE outperforms various state-of-the-art (SoTA) methods across six MIE datasets on three tasks. Furthermore, in-depth analysis demonstrates UMIE's strong generalization in the zero-shot setting, robustness to instruction variants, and interpretability. Our research serves as an initial step towards a unified MIE model and initiates the exploration into both instruction tuning and large language models within the MIE domain. Our code, data, and model are available at https://github.com/ZUCC-AI/UMIE

preprint2022arXiv

An integer grid bridge sampler for the Bayesian inference of incomplete birth-death records

A one-to-one correspondence is established between the bridge path space of birth-death processes and the exclusive union of the product spaces of simplexes and integer grids. Formulae are derived for the exact counting of the integer grid bridges with fixed number of upward jumps. Then a uniform sampler over such restricted bridge path space is constructed. This leads to a Monte Carlo scheme, the integer grid bridge sampler, IGBS, to evaluate the transition probabilities of birth-death processes. Even the near zero probability of rare event could now be evaluated with controlled relative error. The IGBS based Bayesian inference for the incomplete birth-death observations is readily performed in demonstrating examples and in the analysis of a severely incomplete data set recording a real epidemic event. Comparison is performed with the basic bootstrap filter, an elementary sequential importance resampling algorithm. The haunting filtering failure has found no position in the new scheme.

preprint2022arXiv

Ground states of atomic Fermi gases in a two-dimensional optical lattice with and without population imbalance

We study the ground state phase diagram of population balanced and imbalanced ultracold atomic Fermi gases with a short range attractive interaction throughout the crossover from BCS to Bose-Einstein condensation (BEC), in a two-dimensional optical lattice (2DOL) comprised of two lattice and one continuum dimensions. We find that the mixing of lattice and continuum dimensions, together with population imbalance, has an extraordinary effect on pairing and the superfluidity of atomic Fermi gases. In the balanced case, the superfluid ground state prevails the majority of the phase space. However, for relatively small lattice hopping integral $t$ and large lattice constant $d$, a pair density wave (PDW) emerges unexpectedly at intermediate coupling strength, and the nature of the in-plane and overall pairing changes from particle-like to hole-like in the BCS and unitary regimes, associated with an abnormal increase in the Fermi volume with the pairing strength. In the imbalanced case, the stable polarized superfluid phase shrinks to only a small portion of the entire phase space spanned by $t$, $d$, imbalance $p$ and interaction strength $U$, mainly in the bosonic regime of low $p$, moderately strong pairing, and relatively large $t$ and small $d$. Due to the Pauli exclusion between paired and excessive fermions within the confined momentum space, a PDW phase emerges and the overall pairing evolves from particle-like into hole-like, as the pairing strength grows stronger in the BEC regime. In both cases, the ground state property is largely governed by the Fermi surface topology. These findings are very different from the cases of pure 3D continuum, 3D lattice or 1DOL.

preprint2022arXiv

Pairing phenomena and superfluidity of atomic Fermi gases in a two-dimensional optical lattice: Unusual effects of lattice-continuum mixing

We study the superfluid behavior of ultracold atomic Fermi gases with a short range attractive interaction in a two-dimensional optical lattice (2DOL) using a pairing fluctuation theory, within the context of BCS-BEC crossover. We find that the mixing of lattice and continuum dimensions leads to exotic phenomena. For relatively large lattice constant $d$ and small hopping integral $t$, the superfluid transition temperature $T_c$ exhibits a remarkable reentrant behavior as a function of the interaction strength, and leads to a pair density wave ground state, where $T_c$ vanishes, for a range of intermediate coupling strength. In the unitary and BCS regimes, the nature of the in-plane and overall pairing changes from particle-like to hole-like, with an unexpected nonmonotonic dependence of the chemical potential on the pairing strength. The BEC asymptotic behaviors exhibit distinct power law dependencies on the interaction strength compared to cases of pure 3D lattice, 3D continuum, and 1DOL. These predictions can be tested in future experiments.

preprint2022arXiv

Using EBGAN for Anomaly Intrusion Detection

As an active network security protection scheme, intrusion detection system (IDS) undertakes the important responsibility of detecting network attacks in the form of malicious network traffic. Intrusion detection technology is an important part of IDS. At present, many scholars have carried out extensive research on intrusion detection technology. However, developing an efficient intrusion detection method for massive network traffic data is still difficult. Since Generative Adversarial Networks (GANs) have powerful modeling capabilities for complex high-dimensional data, they provide new ideas for addressing this problem. In this paper, we put forward an EBGAN-based intrusion detection method, IDS-EBGAN, that classifies network records as normal traffic or malicious traffic. The generator in IDS-EBGAN is responsible for converting the original malicious network traffic in the training set into adversarial malicious examples. This is because we want to use adversarial learning to improve the ability of discriminator to detect malicious traffic. At the same time, the discriminator adopts Autoencoder model. During testing, IDS-EBGAN uses reconstruction error of discriminator to classify traffic records.

preprint2022arXiv

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Detecting objects from LiDAR point clouds is of tremendous significance in autonomous driving. In spite of good progress, accurate and reliable 3D detection is yet to be achieved due to the sparsity and irregularity of LiDAR point clouds. Among existing strategies, multi-view methods have shown great promise by leveraging the more comprehensive information from both bird's eye view (BEV) and range view (RV). These multi-view methods either refine the proposals predicted from single view via fused features, or fuse the features without considering the global spatial context; their performance is limited consequently. In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one. Thanks to the learned attention mechanism, VISTA can produce fused features of high quality for prediction of proposals. We decouple the classification and regression tasks in VISTA, and an additional constraint of attention variance is applied that enables the attention module to focus on specific targets instead of generic points. We conduct thorough experiments on the benchmarks of nuScenes and Waymo; results confirm the efficacy of our designs. At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist. The source code in PyTorch is available at https://github.com/Gorilla-Lab-SCUT/VISTA

preprint2021arXiv

RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

Recently multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets. However, most of the multimodal methods use attention mechanisms to extract visual clues regardless of whether the text and image are relevant. Practically, the irrelevant text-image pairs account for a large proportion in tweets. The visual clues that are unrelated to the texts will exert uncertain or even negative effects on multimodal model learning. In this paper, we introduce a method of text-image relation propagation into the multimodal BERT model. We integrate soft or hard gates to select visual clues and propose a multitask algorithm to train on the MNER datasets. In the experiments, we deeply analyze the changes in visual attention before and after the use of text-image relation propagation. Our model achieves state-of-the-art performance on the MNER datasets.

preprint2020arXiv

BiSample: Bidirectional Sampling for Handling Missing Data with Local Differential Privacy

Local differential privacy (LDP) has received much interest recently. In existing protocols with LDP guarantees, a user encodes and perturbs his data locally before sharing it to the aggregator. In common practice, however, users would prefer not to answer all the questions due to different privacy-preserving preferences for different questions, which leads to data missing or the loss of data quality. In this paper, we demonstrate a new approach for addressing the challenges of data perturbation with consideration of users' privacy preferences. Specifically, we first propose BiSample: a bidirectional sampling technique value perturbation in the framework of LDP. Then we combine the BiSample mechanism with users' privacy preferences for missing data perturbation. Theoretical analysis and experiments on a set of datasets confirm the effectiveness of the proposed mechanisms.

preprint2020arXiv

HRDNet: High-resolution Detection Network for Small Objects

Small object detection is challenging because small objects do not contain detailed information and may even disappear in the deep network. Usually, feeding high-resolution images into a network can alleviate this issue. However, simply enlarging the resolution will cause more problems, such as that, it aggravates the large variant of object scale and introduces unbearable computation cost. To keep the benefits of high-resolution images without bringing up new problems, we proposed the High-Resolution Detection Network (HRDNet). HRDNet takes multiple resolution inputs using multi-depth backbones. To fully take advantage of multiple features, we proposed Multi-Depth Image Pyramid Network (MD-IPN) and Multi-Scale Feature Pyramid Network (MS-FPN) in HRDNet. MD-IPN maintains multiple position information using multiple depth backbones. Specifically, high-resolution input will be fed into a shallow network to reserve more positional information and reducing the computational cost while low-resolution input will be fed into a deep network to extract more semantics. By extracting various features from high to low resolutions, the MD-IPN is able to improve the performance of small object detection as well as maintaining the performance of middle and large objects. MS-FPN is proposed to align and fuse multi-scale feature groups generated by MD-IPN to reduce the information imbalance between these multi-scale multi-level features. Extensive experiments and ablation studies are conducted on the standard benchmark dataset MS COCO2017, Pascal VOC2007/2012 and a typical small object dataset, VisDrone 2019. Notably, our proposed HRDNet achieves the state-of-the-art on these datasets and it performs better on small objects.

preprint2020arXiv

IPG-Net: Image Pyramid Guidance Network for Small Object Detection

For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top-down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.

preprint2020arXiv

Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles

Trajectory prediction is crucial for autonomous vehicles. The planning system not only needs to know the current state of the surrounding objects but also their possible states in the future. As for vehicles, their trajectories are significantly influenced by the lane geometry and how to effectively use the lane information is of active interest. Most of the existing works use rasterized maps to explore road information, which does not distinguish different lanes. In this paper, we propose a novel instance-aware representation for lane representation. By integrating the lane features and trajectory features, a goal-oriented lane attention module is proposed to predict the future locations of the vehicle. We show that the proposed lane representation together with the lane attention module can be integrated into the widely used encoder-decoder framework to generate diverse predictions. Most importantly, each generated trajectory is associated with a probability to handle the uncertainty. Our method does not suffer from collapsing to one behavior modal and can cover diverse possibilities. Extensive experiments and ablation studies on the benchmark datasets corroborate the effectiveness of our proposed method. Notably, our proposed method ranks third place in the Argoverse motion forecasting competition at NeurIPS 2019.

preprint2020arXiv

Superfluidity and pairing phenomena in ultracold atomic Fermi gases in one-dimensional optical lattices, Part II: Effects of population imbalance

In this paper, we study the effect of population imbalance and its interplay with pairing strength and lattice effect in atomic Fermi gases in a one-dimensional optical lattice. We compute various phase diagrams as the system undergoes BCS-BEC crossover, using the same pairing fluctuation theory as in Part I. We find widespread pseudogap phenomena beyond the BCS regime and intermediate temperature superfluid states for relatively low population imbalances. The Fermi surface topology plays an important role in the behavior of $T_\text{c}$. For large $d$ and/or small $t$, which yield an open Fermi surface, superfluidity can be readily destroyed by a small amount of population imbalance $p$. The superfluid phase, especially in the BEC regime, can exist only for a highly restricted volume of the parameter space. Due to the continuum-lattice mixing, population imbalance gives rise to a new mechanism for pair hopping, as assisted by excessive majority fermions, which may lead to significant enhancement of $T_\text{c}$ on the BEC side of the Feshbach resonance, and also render $T_\text{c}$ approaching a constant asymptote in the BEC limit, when it exists. Furthermore, we find that not all minority fermions will be paired up in BEC limit, unlike the 3D continuum case. These predictions can be tested in future experiments.

preprint2020arXiv

Unusual destruction and enhancement of superfluidity of atomic Fermi gases by population imbalance in a one-dimensional optical lattice

We study the superfluid behavior of a population imbalanced ultracold atomic Fermi gases with a short range attractive interaction in a one-dimensional (1D) optical lattice, using a pairing fluctuation theory. We show that, besides widespread pseudogap phenomena and intermediate temperature superfluidity, the superfluid phase is readily destroyed except in a limited region of the parameter space. We find a new mechanism for pair hopping, assisted by the excessive majority fermions, in the presence of continuum-lattice mixing, which leads to an unusual constant BEC asymptote for $T_c$ that is independent of pairing strength. In result, on the BEC side of unitarity, superfluidity, when it exists, may be strongly enhanced by population imbalance.

preprint2015arXiv

Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks

Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexity of training 3D convolution kernels and the need for large quantities of training videos, only limited success has been reported. This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. Specifically, we propose factorized spatio-temporal convolutional networks (FstCN) that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers (called spatial convolutional layers), followed by learning 1D temporal kernels in the upper layers (called temporal convolutional layers). We introduce a novel transformation and permutation operator to make factorization in FstCN possible. Moreover, to address the issue of sequence alignment, we propose an effective training and inference strategy based on sampling multiple video clips from a given action video sequence. We have tested FstCN on two commonly used benchmark datasets (UCF-101 and HMDB-51). Without using auxiliary training videos to boost the performance, FstCN outperforms existing CNN based methods and achieves comparable performance with a recent method that benefits from using auxiliary training videos.

Lin Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

BCS-BEC crossover in atomic Fermi gases in quasi-two-dimensional Lieb lattices: Effects of flat band and finite temperature

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

An integer grid bridge sampler for the Bayesian inference of incomplete birth-death records

Ground states of atomic Fermi gases in a two-dimensional optical lattice with and without population imbalance

Pairing phenomena and superfluidity of atomic Fermi gases in a two-dimensional optical lattice: Unusual effects of lattice-continuum mixing

Using EBGAN for Anomaly Intrusion Detection

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

BiSample: Bidirectional Sampling for Handling Missing Data with Local Differential Privacy

HRDNet: High-resolution Detection Network for Small Objects

IPG-Net: Image Pyramid Guidance Network for Small Object Detection

Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles

Superfluidity and pairing phenomena in ultracold atomic Fermi gases in one-dimensional optical lattices, Part II: Effects of population imbalance

Unusual destruction and enhancement of superfluidity of atomic Fermi gases by population imbalance in a one-dimensional optical lattice

Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks