Source author record

Junyuan Gao

Junyuan Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Computer Vision Computation and Language

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus

While Large Vision-Language Models (LVLMs) demonstrate promising multilingual capabilities, their evaluation is currently hindered by two critical limitations: (1) the use of non-parallel corpora, which conflates inherent language capability gaps with dataset artifacts, precluding a fair assessment of cross-lingual alignment; and (2) disjointed multimodal inputs, which deviate from real-world scenarios where most texts are embedded within visual contexts. To address these challenges, we propose PM4Bench, the first Multilingual Multi-Modal Multi-task Benchmark constructed on a strictly parallel corpus across 10 languages. By eliminating content divergence, our benchmark enables a fair comparison of model capabilities across different languages. We also introduce a vision setting where textual queries are visually fused into images, compelling models to jointly "see," "read," and "think". Extensive evaluation of 10 LVLMs uncover a substantial performance drop in the Vision setting compared to standard inputs. Further analysis reveals that OCR capability is not only a general bottleneck but also contributes to cross-lingual performance disparities, suggesting that improving multilingual OCR is essential for advancing LVLM performance. We will release PM4Bench at https://github.com/opendatalab/PM4Bench .

preprint2026arXiv

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

Large-scale chemical reaction datasets are crucial for AI research in chemistry. However, existing chemical reaction data often exist as images within papers, making them not machine-readable and unusable for training machine learning models. In response to this challenge, we propose the RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP). Our framework reformulates the traditional coordinate prediction driven parsing process into an image captioning problem, which Large Vision Language Models (LVLMs) handle naturally. We introduce a strategy termed BBox and Index as Visual Prompt (BIVP), which uses our state-of-the-art molecular detector, MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the input image. This turns the downstream parsing into a natural-language description problem. Extensive experiments show that the BIVP strategy significantly improves structural extraction quality while simplifying model design. We further construct the RxnCaption-15k dataset, an order of magnitude larger than prior real-world literature benchmarks, with a balanced test subset across four layout archetypes. Experiments demonstrate that RxnCaption-VL achieves state-of-the-art performance on multiple metrics. We believe our method, dataset, and models will advance structured information extraction from chemical literature and catalyze broader AI applications in chemistry. We will release data, models, and code on GitHub.

preprint2022arXiv

Massive Unsourced Random Access: Exploiting Angular Domain Sparsity

This paper investigates the unsourced random access (URA) scheme to accommodate numerous machine-type users communicating to a base station equipped with multiple antennas. Existing works adopt a slotted transmission strategy to reduce system complexity; they operate under the framework of coupled compressed sensing (CCS) which concatenates an outer tree code to an inner compressed sensing code for slot-wise message stitching. We suggest that by exploiting the MIMO channel information in the angular domain, redundancies required by the tree encoder/decoder in CCS can be removed to improve spectral efficiency, thereby an uncoupled transmission protocol is devised. To perform activity detection and channel estimation, we propose an expectation-maximization-aided generalized approximate message passing algorithm with a Markov random field support structure, which captures the inherent clustered sparsity structure of the angular domain channel. Then, message reconstruction in the form of a clustering decoder is performed by recognizing slot-distributed channels of each active user based on similarity. We put forward the slot-balanced K-means algorithm as the kernel of the clustering decoder, resolving constraints and collisions specific to the application scene. Extensive simulations reveal that the proposed scheme achieves a better error performance at high spectral efficiency compared to the CCS-based URA schemes.

preprint2020arXiv

Energy-efficiency of Massive Random Access with Individual Codebook

The massive machine-type communication has been one of the most representative services for future wireless networks. It aims to support massive connectivity of user equipments (UEs) which sporadically transmit packets with small size. In this work, we assume the number of UEs grows linearly and unboundedly with blocklength and each UE has an individual codebook. Among all UEs, an unknown subset of UEs are active and transmit a fixed number of data bits to a base station over a shared-spectrum radio link. Under these settings, we derive the achievability and converse bounds on the minimum energy-per-bit for reliable random access over quasi-static fading channels with and without channel state information (CSI) at the receiver. These bounds provide energy-efficiency guidance for new schemes suited for massive random access. Simulation results indicate that the orthogonalization scheme TDMA is energy-inefficient for large values of UE density $μ$. Besides, the multi-user interference can be perfectly cancelled when $μ$ is below a critical threshold. In the case of no-CSI, the energy-per-bit for random access is only a bit more than that with the knowledge UE activity.

preprint2020arXiv

Massive Unsourced Random Access for Massive MIMO Correlated Channels

This paper investigates the massive random access for a huge amount of user devices served by a base station (BS) equipped with a massive number of antennas. We consider a grant-free unsourced random access (U-RA) scheme where all users possess the same codebook and the BS aims at declaring a list of transmitted codewords and recovering the messages sent by active users. Most of the existing works concentrate on applying U-RA in the oversimplified independent and identically distributed (i.i.d.) channels. In this paper, we consider a fairly general joint-correlated MIMO channel model with line-of-sight components for the realistic outdoor wireless propagation environments. We conduct the activity detection for the emitted codewords by performing an improved coordinate descent approach with Bayesian learning automaton to solve a covariance-based maximum likelihood estimation problem. The proposed algorithm exhibits a faster convergence rate than traditional descent approaches. We further employ a coupled coding scheme to resolve the issue that the dimensions of the common codebook expand exponentially with user payload size in the practical massive machine-type communications scenario. Our simulations reveal that to achieve an error probability of 0.05 for reliable communications in correlated channels, one must pay a 0.9 to 1.3 dB penalty comparing to the minimum signal to noise ratio needed in i.i.d. channels on condition that a sufficient number of receiving antennas is equipped at the BS.

Junyuan Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

Massive Unsourced Random Access: Exploiting Angular Domain Sparsity

Energy-efficiency of Massive Random Access with Individual Codebook

Massive Unsourced Random Access for Massive MIMO Correlated Channels