Source author record

Yunlu Xu

Yunlu Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall cond-mat.mtrl-sci Machine Learning Neural and Evolutionary Computing physics.optics

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

This paper presents DavarOCR, an open-source toolbox for OCR and document understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 different task forms. DavarOCR provides detailed usage instructions and the trained models for each algorithm. Compared with the previous opensource OCR toolbox, DavarOCR has relatively more complete support for the sub-tasks of the cutting-edge technology of document understanding. In order to promote the development and application of OCR technology in academia and industry, we pay more attention to the use of modules that different sub-domains of technology can share. DavarOCR is publicly released at https://github.com/hikopensource/Davar-Lab-OCR.

preprint2022arXiv

E2-AEN: End-to-End Incremental Learning with Adaptively Expandable Network

Expandable networks have demonstrated their advantages in dealing with catastrophic forgetting problem in incremental learning. Considering that different tasks may need different structures, recent methods design dynamic structures adapted to different tasks via sophisticated skills. Their routine is to search expandable structures first and then train on the new tasks, which, however, breaks tasks into multiple training stages, leading to suboptimal or overmuch computational cost. In this paper, we propose an end-to-end trainable adaptively expandable network named E2-AEN, which dynamically generates lightweight structures for new tasks without any accuracy drop in previous tasks. Specifically, the network contains a serial of powerful feature adapters for augmenting the previously learned representations to new tasks, and avoiding task interference. These adapters are controlled via an adaptive gate-based pruning strategy which decides whether the expanded structures can be pruned, making the network structure dynamically changeable according to the complexity of the new tasks. Moreover, we introduce a novel sparsity-activation regularization to encourage the model to learn discriminative features with limited parameters. E2-AEN reduces cost and can be built upon any feed-forward architectures in an end-to-end manner. Extensive experiments on both classification (i.e., CIFAR and VDD) and detection (i.e., COCO, VOC and ICCV2021 SSLAD challenge) benchmarks demonstrate the effectiveness of the proposed method, which achieves the new remarkable results.

preprint2022arXiv

Technical Report for ICCV 2021 Challenge SSLAD-Track3B: Transformers Are Better Continual Learners

In the SSLAD-Track 3B challenge on continual learning, we propose the method of COntinual Learning with Transformer (COLT). We find that transformers suffer less from catastrophic forgetting compared to convolutional neural network. The major principle of our method is to equip the transformer based feature extractor with old knowledge distillation and head expanding strategies to compete catastrophic forgetting. In this report, we first introduce the overall framework of continual learning for object detection. Then, we analyse the key elements' effect on withstanding catastrophic forgetting in our solution. Our method achieves 70.78 mAP on the SSLAD-Track 3B challenge test set.

preprint2022arXiv

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents

Recently, automatically extracting information from visually rich documents (e.g., tickets and resumes) has become a hot and vital research topic due to its widespread commercial value. Most existing methods divide this task into two subparts: the text reading part for obtaining the plain text from the original document images and the information extraction part for extracting key contents. These methods mainly focus on improving the second, while neglecting that the two parts are highly correlated. This paper proposes a unified end-to-end information extraction framework from visually rich documents, where text reading and information extraction can reinforce each other via a well-designed multi-modal context block. Specifically, the text reading part provides multi-modal features like visual, textual and layout features. The multi-modal context block is developed to fuse the generated multi-modal features and even the prior knowledge from the pre-trained language model for better semantic representation. The information extraction part is responsible for generating key contents with the fused context features. The framework can be trained in an end-to-end trainable manner, achieving global optimization. What is more, we define and group visually rich documents into four categories across two dimensions, the layout and text type. For each document category, we provide or recommend the corresponding benchmarks, experimental settings and strong baselines for remedying the problem that this research area lacks the uniform evaluation standard. Extensive experiments on four kinds of benchmarks (from fixed layout to variable layout, from full-structured text to semi-unstructured text) are reported, demonstrating the proposed method's effectiveness. Data, source code and models are available.

preprint2020arXiv

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem of gate undertraining, which can be caused by various factors, such as the saturating activation functions, the gate layouts (e.g., the gate number and gating functions), or even the suboptimal memory state etc.. Those may result in failures of learning gating switch roles and thus the weak performance. In this paper, we propose a new gating mechanism within general gated recurrent neural networks to handle this issue. Specifically, the proposed gates directly short connect the extracted input features to the outputs of vanilla gates, denoted as refined gates. The refining mechanism allows enhancing gradient back-propagation as well as extending the gating activation scope, which can guide RNN to reach possibly deeper minima. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU. Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5 scene text recognition benchmarks demonstrate the effectiveness of our method.

preprint2014arXiv

Approaching the Limits of Transparency and Conductivity in Graphitic Materials through Lithium Intercalation

Various bandstructure engineering methods have been studied to improve the performance of graphitic transparent conductors; however none demonstrated an increase of optical transmittance in the visible range. Here we measure in situ optical transmittance spectra and electrical transport properties of ultrathin-graphite (3-60 graphene layers) simultaneously via electrochemical lithiation/delithiation. Upon intercalation we observe an increase of both optical transmittance (up to twofold) and electrical conductivity (up to two orders of magnitude), strikingly different from other materials. Transmission as high as 91.7% with a sheet resistance of 3.0 Ω per square is achieved for 19-layer LiC6, which corresponds to a figure of merit σ_dc/σ_opt = 1400, significantly higher than any other continuous transparent electrodes. The unconventional modification of ultrathin-graphite optoelectronic properties is explained by the suppression of interband optical transitions and a small intraband Drude conductivity near the interband edge. Our techniques enable the investigation of other aspects of intercalation in nanostructures.

preprint2014arXiv

The Shockley-Queisser limit for nanostructured solar cells

The Shockley-Queisser limit describes the maximum solar energy conversion efficiency achievable for a particular material and is the standard by which new photovoltaic technologies are compared. This limit is based on the principle of detailed balance, which equates the photon flux into a device to the particle flux (photons or electrons) out of that device. Nanostructured solar cells represent a new class of photovoltaic devices, and questions have been raised about whether or not they can exceed the Shockley-Queisser limit. Here we show that single-junction nanostructured solar cells have a theoretical maximum efficiency of 42% under AM 1.5 solar illumination. While this exceeds the efficiency of a non- concentrating planar device, it does not exceed the Shockley-Queisser limit for a planar device with optical concentration. We conclude that nanostructured solar cells offer an important route towards higher efficiency photovoltaic devices through a built-in optical concentration.

Yunlu Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

E2-AEN: End-to-End Incremental Learning with Adaptively Expandable Network

Technical Report for ICCV 2021 Challenge SSLAD-Track3B: Transformers Are Better Continual Learners

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Approaching the Limits of Transparency and Conductivity in Graphitic Materials through Lithium Intercalation

The Shockley-Queisser limit for nanostructured solar cells