Source author record

Dongsheng Wang

Dongsheng Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

quant-ph Computation and Language Information Retrieval Machine Learning Artificial Intelligence Computer Vision Methodology physics.atom-ph physics.gen-ph

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

DocGraphLM: Documental Graph Language Model for Information Extraction

Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs. DocGraphLM predicts both directions and distances between nodes using a convergent joint loss function that prioritizes neighborhood restoration and downweighs distant node detection. Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features. Moreover, we report that adopting the graph features accelerates convergence in the learning process during training, despite being solely constructed through link prediction.

preprint2023arXiv

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.

preprint2022arXiv

Ordinal Graph Gamma Belief Network for Social Recommender Systems

To build recommender systems that not only consider user-item interactions represented as ordinal variables, but also exploit the social network describing the relationships between the users, we develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions. OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences. We further extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model that captures the user preferences and social communities at multiple semantic levels. For efficient inference, we develop a parallel hybrid Gibbs-EM algorithm, which exploits the sparsity of the graphs and is scalable to large datasets. Our experimental results show that the proposed models not only outperform recent baselines on recommendation datasets with explicit or implicit feedback, but also provide interpretable latent representations.

preprint2022arXiv

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference that is either not scalable or suffers from large approximation error. This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space. Embedding the words and topics in the same vector space, we define a method to measure the semantic difference between the embedding vectors of the words of a document and these of the topics, and optimize the topic embeddings to minimize the expected difference over all documents. Experiments on text analysis demonstrate that the proposed method, which is amenable to mini-batch stochastic gradient descent based optimization and hence scalable to big corpora, provides competitive performance in discovering more coherent and diverse topics and extracting better document representations.

preprint2022arXiv

Short Range Correlation Transformer for Occluded Person Re-Identification

Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Convolutional neural network pays more attention to the extraction of local features, therefore it is difficult to extract features of occluded pedestrians and the effect is not so satisfied. Recently, vision transformer is introduced into the field of re-identification and achieves the most advanced results by constructing the relationship of global features between patch sequences. However, the performance of vision transformer in extracting local features is inferior to that of convolutional neural network. Therefore, we design a partial feature transformer-based person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.

preprint2015arXiv

Uncertainty Principle Respects Locality

The notion of nonlocality implicitly implies there might be some kind of spooky action at a distance in nature, however, the validity of quantum mechanics has been well tested up to now. In this work it is argued that the notion of nonlocality is physically improper, the basic principle of locality in nature is well respected by quantum mechanics, namely, the uncertainty principle. We show that the quantum bound on the Clauser, Horne, Shimony, and Holt (CHSH) inequality can be recovered from the uncertainty relation in a multipartite setting. We further argue that the super-quantum correlation demonstrated by the nonlocal box is not physically comparable with the quantum one. The origin of the quantum structure of nature still remains to be explained, some post-quantum theory which is more complete in some sense than quantum mechanics is possible and might not necessarily be a hidden variable theory.

preprint2013arXiv

Complete positivity and contextuality of quantum dynamics

Positivity or the stronger notion of complete positivity, and contextuality are central properties of quantum dynamics. In this work, we demonstrate that a physical unitary-universe dilation model could be employed to characterize the completely positive map, regardless of the initial correlation condition. Particularly, the problem of initial correlation can be resolved by a swap operation. Furthermore, we discuss the physical essence of completely positive map and highlights its limitations. Then we develop the quantum measurement-chain formula beyond the framework of completely positive map in order to describe much broader quantum dynamics, and therein the property of contextuality could be captured via measurement transfer matrix.

preprint2010arXiv

Quantum interference in a four-level system of a $^{87}\mathrm{Rb}$ atom: Effects of spontaneously generated coherence

In this work, the effects of quantum interference and spontaneously generated coherence (SGC) are theoretically analyzed in a four level system of a $^{87}\mathrm{Rb}$ atom. For the effects of SGC, we find that a new kind of EIT channel can be induced due to destructive interference, and the nonlinear Kerr absorption can be coherently narrowed or eliminated under different strengths of the coupling and switching fields.

Dongsheng Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

DocGraphLM: Documental Graph Language Model for Information Extraction

DocLLM: A layout-aware generative language model for multimodal document understanding

Ordinal Graph Gamma Belief Network for Social Recommender Systems

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

Short Range Correlation Transformer for Occluded Person Re-Identification

Uncertainty Principle Respects Locality

Complete positivity and contextuality of quantum dynamics

Quantum interference in a four-level system of a $^{87}\mathrm{Rb}$ atom: Effects of spontaneously generated coherence