Source author record

Lei Kang

Lei Kang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci cond-mat.mes-hall physics.comp-ph

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering

Multi-page Document Visual Question Answering (MP-DocVQA) remains challenging because long documents not only strain computational resources but also reduce the effectiveness of the attention mechanism in large vision-language models (LVLMs). We tackle these issues with an Adaptive Visual In-document Retrieval (AVIR) framework. A lightweight retrieval model first scores each page for question relevance. Pages are then clustered according to the score distribution to adaptively select relevant content. The clustered pages are screened again by Top-K to keep the context compact. However, for short documents, clustering reliability decreases, so we use a relevance probability threshold to select pages. The selected pages alone are fed to a frozen LVLM for answer generation, eliminating the need for model fine-tuning. The proposed AVIR framework reduces the average page count required for question answering by 70%, while achieving an ANLS of 84.58% on the MP-DocVQA dataset-surpassing previous methods with significantly lower computational cost. The effectiveness of the proposed AVIR is also verified on the SlideVQA and DUDE benchmarks. The code is available at https://github.com/Li-yachuan/AVIR.

preprint2026arXiv

SIMI: Self-information Mining Network for Low-light Image Enhancement

Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-light images. In this work, we propose the Self-Information Mining (SIMI) network, an innovative unsupervised framework that decomposes low-light images into multiple components based on bit-plane decomposition. Our approach allows mining intrinsic information without relying on external data. This not only accelerates model convergence but also improves performance and reduces computational overhead. The unsupervised nature of our method facilitates real-world applicability. Experiments conducted on standard benchmarks demonstrate that SIMI achieves state-of-the-art performance.

preprint2022arXiv

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to gain volume and augment the handwriting style variability. However, there is a significant style bias between synthetic and real data which hinders the improvement of recognition performance. To deal with such limitations, we propose a generative method for handwritten text-line images, which is conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles. Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content. Extensive experiments have been done on making use of the generated samples to boost Handwritten Text Recognition performance. Both qualitative and quantitative results demonstrate that the proposed approach outperforms the current state of the art.

preprint2021arXiv

Role of Interlayer Coupling in the Second Harmonic Generation of Bilayer Transition-metal Dichalcogenides

Little is known about the role of weak interlayer coupling in the second harmonic generation (SHG) effects of two-dimensional van der Waals (vdW) systems. In this article, taking homo-bilayer $MoS_2/MoS_2$ and hetero-bilayer $MoS_2/MoSe_2$ as typical examples, we have systemically investigated their SHG susceptibilities as a function of interlayer hopping strength using first-principles calculations. For the SHG at zero frequency limit of both $MoS_2/MoS_2$ and $MoS_2/MoSe_2$, although the increase of tint can increase the intensities of interlayer optical transitions (IOT), the increased band repulsion around G point can eventually decrease their SHG values; the larger the tint, the smaller the SHG response. For the SHG spectra of $MoS_2/MoSe_2$ in the low photon-energy region, opposite to the $MoS_2/MoS_2$, their peak values are very sensitive to the variable tint, due to the strong $t_{int}$-dependent IOT dominating in the band edge; the larger the tint, the larger the SHG. For the SHG of $MoS_2/MoS_2$ in the high photon-energy region, comparing to the $MoS_2/MoSe_2$, their peak values will decrease in a much more noticeable way as the $t_{int}$ increases, due to the larger reduction of band nesting effect. Our study not only can successfully explain the puzzling experimental observations for the different SHG responses in different bilayer transition-metal dichalcogenides under variable tint, but also may provide a general understanding for designing controllable the SHG effects in the vdW systems.

preprint2020arXiv

GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

Although current image generation methods have reached impressive quality levels, they are still unable to produce plausible yet diverse images of handwritten words. On the contrary, when writing by hand, a great variability is observed across different writers, and even when analyzing words scribbled by the same individual, involuntary variations are conspicuous. In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words. We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content. Our generator is guided by three complementary learning objectives: to produce realistic images, to imitate a certain handwriting style and to convey a specific textual content. Our model is unconstrained to any predefined vocabulary, being able to render whatever input word. Given a sample writer, it is also able to mimic its calligraphic features in a few-shot setup. We significantly advance over prior art and demonstrate with qualitative, quantitative and human-based evaluations the realistic aspect of our synthetically produced images.

preprint2020arXiv

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios.

preprint2020arXiv

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.

preprint2019arXiv

Giant Enhancement of Solid Solubility in Monolayer BNC Alloys by Selective Orbital Coupling

Solid solubility (SS) is one of the most important features of alloys, which is usually difficult to be largely tuned in the entire alloy concentrations by external approaches. Some alloys that were supposed to have promising physical properties could turn out to be much less useful because of their poor SS, e.g., the case for monolayer BNC [(BN)1-x(C2)x] alloys. Until now, an effective approach on significantly enhancing SS of (BN)1-x(C2)x in the entire x is still lacking. In this article, a novel mechanism of selective orbital coupling between high energy wrong-bond states and surface states mediated by the specific substrate has been proposed to stabilize the wrong-bonds and in turn significantly enhance the SS of (BN)1-x(C2)x alloys. Surprisingly, we demonstrate that five ordered alloys, exhibiting variable direct quasi-particle bandgaps from 1.35 to 3.99 eV, can spontaneously be formed at different x when (BN)1-x(C2)x is grown on hcp-phase Cr. Interestingly, the optical transitions around the band edges in these ordered alloys, accompanied by largely tunable exciton binding energies of ~1 eV at different x, are significantly strong due to their unique band structures. Importantly, the disordered (BN)1-x(C2)x alloys, exhibiting fully tunable bandgaps from 0 to ~6 eV in the entire x, can be formed on Cr substrate at the miscibility temperature of ~1200 K, which is greatly reduced compared to that of 4500~5600 K in free-standing form or on other substrates. Our discovery not only may resolve the long-standing SS problem of BNC alloys, but also could significantly extend the applications of BNC alloys for various optoelectronic applications.

preprint2010arXiv

Negative refraction at deep-ultraviolet frequency in monocrystalline graphite

Negative refraction is such a prominent electromagnetic phenomenon that most researchers believe it can only occur in artificially engineered metamaterials. In this article, we report negative refraction for all incident angles for the first time in a naturally existing material. Using ellipsometry measurement of the equifrequency contour in the deep-ultraviolet frequency region (typically 254 nm), obvious negative refraction was demonstrated in monocrystalline graphite for incident angles ranging from 20o to 70o. This negative refraction is attributed to extremely strong anisotropy in the crystal structure of graphite, which gives the crystal indefinite permeability. This result not only explores a new route to identifying natural negative-index materials, but it also holds promise for the development of an ultraviolet hyperlens, which may lead to a breakthrough in nanolithography, the most critical technology necessary for the next generation of electronics.

Lei Kang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering

SIMI: Self-information Mining Network for Low-light Image Enhancement

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Role of Interlayer Coupling in the Second Harmonic Generation of Bilayer Transition-metal Dichalcogenides

GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

Giant Enhancement of Solid Solubility in Monolayer BNC Alloys by Selective Orbital Coupling

Negative refraction at deep-ultraviolet frequency in monocrystalline graphite