Source author record

Maojun Zhang

Maojun Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.SP eess.IV Information Theory math.IT

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

Despite the rapid progress in data-driven 3D vision, aerial geometric 3D vision remains a formidable challenge due to the severe scarcity of large-scale, high-fidelity training data. Existing benchmarks, predominantly biased toward ground-level or object-centric views, do not account for complex viewpoint transformations and diverse environmental conditions in UAV-based sensing. To bridge this critical gap, we propose AirZoo, a unified large-scale dataset and benchmark for grounding aerial geometric 3D vision. AirZoo possesses three appealing properties: 1) Scalable Generation Pipeline: Leveraging freely available, world-scale photogrammetric 3D meshes, it renders vast outdoor environments with customizable UAV flight trajectories and configurable weather/illumination. 2) Comprehensive Scene Diversity: It provides the most extensive coverage of region types to date (spanning 378 regions across 22 countries), systematically encompassing both highly structured urban landscapes and complex unstructured natural environments. 3) Rich Geometric Annotations: Each frame provides synchronized, pixel-level metric depth and precise 6-DoF geo-referenced poses, essential for geometry-aware learning. Through three rigorous evaluation tracks -- aerial image retrieval, cross-view matching, and multi-view 3D reconstruction -- we demonstrate that AirZoo serves as a powerful pre-training engine. Extensive experiments on both public and newly collected real-world benchmarks reveal that fine-tuning on AirZoo yields substantial performance gains for SoTA models (e.g., MegaLoc, RoMa, VGGT, and Depth Anything 3), establishing a new performance upper bound for aerial spatial intelligence.

preprint2026arXiv

ICWLM: A Multi-Task Wireless Large Model via In-Context Learning

The rapid evolution of wireless communication technologies, particularly massive multiple-input multiple-output (mMIMO) and millimeter-wave (mmWave), introduces significant network complexity and computational demands. Significant research efforts have been made to improve physical layer performance by resorting to deep learning (DL) methods, which, however, are usually task-specific and struggle with data scarcity and generalization. To address these challenges, we propose a novel In-Context Wireless Large Model (ICWLM), a wireless-native foundation model designed for simultaneous multi-task learning at the physical layer. Unlike conventional methods that adapt wireless data to pre-trained large language models (LLMs), ICWLM is trained directly on large-scale, mixed wireless datasets from scratch. It jointly solves multiple classical physical layer problems, including multi-user precoding (sum-rate maximization and max-min SINR) and channel prediction. A key innovation of ICWLM is its utilization of in-context learning (ICL), enabling the model to adapt to varying system configurations and channel conditions with minimal demonstration pairs, eliminating the need for extensive retraining. Extensive simulation results demonstrate that ICWLM achieves competitive performance compared to task-specific methods while exhibiting remarkable generalization capabilities to unseen system configurations. This work offers a promising paradigm for developing unified and adaptive AI models for future wireless networks, potentially reducing deployment complexity and enhancing intelligent resource management.

preprint2022arXiv

A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design

Using precoding to suppress multi-user interference is a well-known technique to improve spectra efficiency in multiuser multiple-input multiple-output (MU-MIMO) systems, and the pursuit of high performance and low complexity precoding method has been the focus in the last decade. The traditional algorithms including the zero-forcing (ZF) algorithm and the weighted minimum mean square error (WMMSE) algorithm failed to achieve a satisfactory trade-off between complexity and performance. In this paper, leveraging on the power of deep learning, we propose a low-complexity precoding design framework for MU-MIMO systems. The key idea is to transform the MIMO precoding problem into the multiple-input single-output precoding problem, where the optimal precoding structure can be obtained in closed-form. A customized deep neural network is designed to fit the mapping from the channels to the precoding matrix. In addition, the technique of input dimensionality reduction, network pruning, and recovery module compression are used to further improve the computational efficiency. Furthermore, the extension to the practical MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) system is studied. Simulation results show that the proposed low-complexity precoding scheme achieves similar performance as the WMMSE algorithm with very low computational complexity.

preprint2020arXiv

DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable U-Net (DeU-Net) to fully exploit spatio-temporal information from 3D cardiac MRI video, including a Temporal Deformable Aggregation Module (TDAM) and a Deformable Global Position Attention (DGPA) network. First, the TDAM takes a cardiac MRI video clip as input with temporal information extracted by an offset prediction network. Then we fuse extracted temporal information via a temporal aggregation deformable convolution to produce fused feature maps. Furthermore, to aggregate meaningful features, we devise the DGPA network by employing deformable attention U-Net, which can encode a wider range of multi-dimensional contextual information into global and local features. Experimental results show that our DeU-Net achieves the state-of-the-art performance on commonly used evaluation metrics, especially for cardiac marginal information (ASSD and HD).

preprint2020arXiv

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall. In this paper, we present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy. We formulate image retrieval task as a node binary classification problem in graph data: a node is marked as positive if it shares the scene overlaps with the query image. The key idea is that we find that the local context in feature space around a query image contains rich information about the matchable relation between this image and its neighbors. By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes. Besides, compared with state-of-the-art matchable retrieval methods, the proposed approach significantly reduces useless attempted matches without sacrificing the accuracy and completeness of reconstruction.

Maojun Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

ICWLM: A Multi-Task Wireless Large Model via In-Context Learning

A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design

DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

Image Retrieval for Structure-from-Motion via Graph Convolutional Network