Researcher profile

Maojun Zhang

Maojun Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

Despite the rapid progress in data-driven 3D vision, aerial geometric 3D vision remains a formidable challenge due to the severe scarcity of large-scale, high-fidelity training data. Existing benchmarks, predominantly biased toward ground-level or object-centric views, do not account for complex viewpoint transformations and diverse environmental conditions in UAV-based sensing. To bridge this critical gap, we propose AirZoo, a unified large-scale dataset and benchmark for grounding aerial geometric 3D vision. AirZoo possesses three appealing properties: 1) Scalable Generation Pipeline: Leveraging freely available, world-scale photogrammetric 3D meshes, it renders vast outdoor environments with customizable UAV flight trajectories and configurable weather/illumination. 2) Comprehensive Scene Diversity: It provides the most extensive coverage of region types to date (spanning 378 regions across 22 countries), systematically encompassing both highly structured urban landscapes and complex unstructured natural environments. 3) Rich Geometric Annotations: Each frame provides synchronized, pixel-level metric depth and precise 6-DoF geo-referenced poses, essential for geometry-aware learning. Through three rigorous evaluation tracks -- aerial image retrieval, cross-view matching, and multi-view 3D reconstruction -- we demonstrate that AirZoo serves as a powerful pre-training engine. Extensive experiments on both public and newly collected real-world benchmarks reveal that fine-tuning on AirZoo yields substantial performance gains for SoTA models (e.g., MegaLoc, RoMa, VGGT, and Depth Anything 3), establishing a new performance upper bound for aerial spatial intelligence.

preprint2026arXiv

ICWLM: A Multi-Task Wireless Large Model via In-Context Learning

The rapid evolution of wireless communication technologies, particularly massive multiple-input multiple-output (mMIMO) and millimeter-wave (mmWave), introduces significant network complexity and computational demands. Significant research efforts have been made to improve physical layer performance by resorting to deep learning (DL) methods, which, however, are usually task-specific and struggle with data scarcity and generalization. To address these challenges, we propose a novel In-Context Wireless Large Model (ICWLM), a wireless-native foundation model designed for simultaneous multi-task learning at the physical layer. Unlike conventional methods that adapt wireless data to pre-trained large language models (LLMs), ICWLM is trained directly on large-scale, mixed wireless datasets from scratch. It jointly solves multiple classical physical layer problems, including multi-user precoding (sum-rate maximization and max-min SINR) and channel prediction. A key innovation of ICWLM is its utilization of in-context learning (ICL), enabling the model to adapt to varying system configurations and channel conditions with minimal demonstration pairs, eliminating the need for extensive retraining. Extensive simulation results demonstrate that ICWLM achieves competitive performance compared to task-specific methods while exhibiting remarkable generalization capabilities to unseen system configurations. This work offers a promising paradigm for developing unified and adaptive AI models for future wireless networks, potentially reducing deployment complexity and enhancing intelligent resource management.

preprint2022arXiv

A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design

Using precoding to suppress multi-user interference is a well-known technique to improve spectra efficiency in multiuser multiple-input multiple-output (MU-MIMO) systems, and the pursuit of high performance and low complexity precoding method has been the focus in the last decade. The traditional algorithms including the zero-forcing (ZF) algorithm and the weighted minimum mean square error (WMMSE) algorithm failed to achieve a satisfactory trade-off between complexity and performance. In this paper, leveraging on the power of deep learning, we propose a low-complexity precoding design framework for MU-MIMO systems. The key idea is to transform the MIMO precoding problem into the multiple-input single-output precoding problem, where the optimal precoding structure can be obtained in closed-form. A customized deep neural network is designed to fit the mapping from the channels to the precoding matrix. In addition, the technique of input dimensionality reduction, network pruning, and recovery module compression are used to further improve the computational efficiency. Furthermore, the extension to the practical MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) system is studied. Simulation results show that the proposed low-complexity precoding scheme achieves similar performance as the WMMSE algorithm with very low computational complexity.

preprint2020arXiv

DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable U-Net (DeU-Net) to fully exploit spatio-temporal information from 3D cardiac MRI video, including a Temporal Deformable Aggregation Module (TDAM) and a Deformable Global Position Attention (DGPA) network. First, the TDAM takes a cardiac MRI video clip as input with temporal information extracted by an offset prediction network. Then we fuse extracted temporal information via a temporal aggregation deformable convolution to produce fused feature maps. Furthermore, to aggregate meaningful features, we devise the DGPA network by employing deformable attention U-Net, which can encode a wider range of multi-dimensional contextual information into global and local features. Experimental results show that our DeU-Net achieves the state-of-the-art performance on commonly used evaluation metrics, especially for cardiac marginal information (ASSD and HD).

preprint2020arXiv

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall. In this paper, we present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy. We formulate image retrieval task as a node binary classification problem in graph data: a node is marked as positive if it shares the scene overlaps with the query image. The key idea is that we find that the local context in feature space around a query image contains rich information about the matchable relation between this image and its neighbors. By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes. Besides, compared with state-of-the-art matchable retrieval methods, the proposed approach significantly reduces useless attempted matches without sacrificing the accuracy and completeness of reconstruction.