Researcher profile

Xu Ma

Xu Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning

Advanced chart question answering requires both precise perception of small visual elements and multi-step reasoning across several subplots. While existing MLLMs are strong at understanding single plots, they often struggle with multi-step reasoning across multiple subplots. We propose HierVA, a hierarchical visual agent framework for chart reasoning that iteratively constructs and updates a working context in a joint image--text space. A high-level manager generates plans and maintains a compact context containing only key information, while specialized workers perform reasoning, gather evidence, and return results. In particular, the agent maintains separate visual and textual contexts, using a zoom-in tool to restrict the visual context. Experiments on the CharXiv reasoning subset demonstrate consistent improvements over strong multimodal baselines, and ablation studies verify that hierarchical architecture, scoped visual context, and distilled context contribute complementary gains.

preprint2026arXiv

Segmentation-Driven Monocular Shape from Polarization based on Physical Model

Monocular shape-from-polarization (SfP) leverages the intrinsic relationship between light polarization properties and surface geometry to recover surface normals from single-view polarized images, providing a compact and robust approach for three-dimensional (3D) reconstruction. Despite its potential, existing monocular SfP methods suffer from azimuth angle ambiguity, an inherent limitation of polarization analysis, that severely compromises reconstruction accuracy and stability. This paper introduces a novel segmentation-driven monocular SfP (SMSfP) framework that reformulates global shape recovery into a set of local reconstructions over adaptively segmented convex sub-regions. Specifically, a polarization-aided adaptive region growing (PARG) segmentation strategy is proposed to decompose the global convexity assumption into locally convex regions, effectively suppressing azimuth ambiguities and preserving surface continuity. Furthermore, a multi-scale fusion convexity prior (MFCP) constraint is developed to ensure local surface consistency and enhance the recovery of fine textural and structural details. Extensive experiments on both synthetic and real-world datasets validate the proposed approach, showing significant improvements in disambiguation accuracy and geometric fidelity compared with existing physics-based monocular SfP techniques.

preprint2022arXiv

Attention-based Cross-Layer Domain Alignment for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to learn transferable knowledge from a labeled source domain and adapts a trained model to an unlabeled target domain. To bridge the gap between source and target domains, one prevailing strategy is to minimize the distribution discrepancy by aligning their semantic features extracted by deep models. The existing alignment-based methods mainly focus on reducing domain divergence in the same model layer. However, the same level of semantic information could distribute across model layers due to the domain shifts. To further boost model adaptation performance, we propose a novel method called Attention-based Cross-layer Domain Alignment (ACDA), which captures the semantic relationship between the source and target domains across model layers and calibrates each level of semantic information automatically through a dynamic attention mechanism. An elaborate attention mechanism is designed to reweight each cross-layer pair based on their semantic similarity for precise domain alignment, effectively matching each level of semantic information during model adaptation. Extensive experiments on multiple benchmark datasets consistently show that the proposed method ACDA yields state-of-the-art performance.

preprint2022arXiv

Label-Efficient Domain Generalization via Collaborative Exploration and Generalization

Considerable progress has been made in domain generalization (DG) which aims to learn a generalizable model from multiple well-annotated source domains to unknown target domains. However, it can be prohibitively expensive to obtain sufficient annotation for source datasets in many real scenarios. To escape from the dilemma between domain generalization and annotation costs, in this paper, we introduce a novel task named label-efficient domain generalization (LEDG) to enable model generalization with label-limited source domains. To address this challenging task, we propose a novel framework called Collaborative Exploration and Generalization (CEG) which jointly optimizes active exploration and semi-supervised generalization. Specifically, in active exploration, to explore class and domain discriminability while avoiding information divergence and redundancy, we query the labels of the samples with the highest overall ranking of class uncertainty, domain representativeness, and information diversity. In semi-supervised generalization, we design MixUp-based intra- and inter-domain knowledge augmentation to expand domain knowledge and generalize domain invariance. We unify active exploration and semi-supervised generalization in a collaborative way and promote mutual enhancement between them, boosting model generalization with limited annotation. Extensive experiments show that CEG yields superior generalization performance. In particular, CEG can even use only 5% data annotation budget to achieve competitive results compared to the previous DG methods with fully labeled data on PACS dataset.

preprint2022arXiv

Predicting Peak Day and Peak Hour of Electricity Demand with Ensemble Machine Learning

Battery energy storage systems can be used for peak demand reduction in power systems, leading to significant economic benefits. Two practical challenges are 1) accurately determining the peak load days and hours and 2) quantifying and reducing uncertainties associated with the forecast in probabilistic risk measures for dispatch decision-making. In this study, we develop a supervised machine learning approach to generate 1) the probability of the next operation day containing the peak hour of the month and 2) the probability of an hour to be the peak hour of the day. Guidance is provided on the preparation and augmentation of data as well as the selection of machine learning models and decision-making thresholds. The proposed approach is applied to the Duke Energy Progress system and successfully captures 69 peak days out of 72 testing months with a 3% exceedance probability threshold. On 90% of the peak days, the actual peak hour is among the 2 hours with the highest probabilities.

preprint2022arXiv

Towards Layer-wise Image Vectorization

Image rasterization is a mature technique in computer graphics, while image vectorization, the reverse path of rasterization, remains a major challenge. Recent advanced deep learning-based models achieve vectorization and semantic interpolation of vector graphs and demonstrate a better topology of generating new figures. However, deep models cannot be easily generalized to out-of-domain testing data. The generated SVGs also contain complex and redundant shapes that are not quite convenient for further editing. Specifically, the crucial layer-wise topology and fundamental semantics in images are still not well understood and thus not fully explored. In this work, we propose Layer-wise Image Vectorization, namely LIVE, to convert raster images to SVGs and simultaneously maintain its image topology. LIVE can generate compact SVG forms with layer-wise structures that are semantically consistent with human perspective. We progressively add new bezier paths and optimize these paths with the layer-wise framework, newly designed loss functions, and component-wise path initialization technique. Our experiments demonstrate that LIVE presents more plausible vectorized forms than prior works and can be generalized to new images. With the help of this newly learned topology, LIVE initiates human editable SVGs for both designers and other downstream applications. Codes are made available at https://github.com/Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization.

preprint2020arXiv

DCANet: Learning Connected Attentions for Convolutional Neural Networks

While self-attention mechanism has shown promising results for many vision tasks, it only considers the current features at a time. We show that such a manner cannot take full advantage of the attention mechanism. In this paper, we present Deep Connected Attention Network (DCANet), a novel design that boosts attention modules in a CNN model without any modification of the internal structure. To achieve this, we interconnect adjacent attention blocks, making information flow among attention blocks possible. With DCANet, all attention blocks in a CNN model are trained jointly, which improves the ability of attention learning. Our DCANet is generic. It is not limited to a specific attention module or base network architecture. Experimental results on ImageNet and MS COCO benchmarks show that DCANet consistently outperforms the state-of-the-art attention modules with a minimal additional computational overhead in all test cases. All code and models are made publicly available.

preprint2020arXiv

Model Predictive Control of Discrete-Continuous Energy Systems via Generalized Disjunctive Programming

Generalized Disjunctive Programming (GDP) provides an alternative framework to model optimization problems with both discrete and continuous variables. The key idea behind GDP involves the use of logical disjunctions to represent discrete decisions in the continuous space, and logical propositions to denote algebraic constraints in the discrete space. Compared to traditional mixed-integer programming (MIP), the inherent logic structure in GDP yields tighter relaxations that are exploited by global branch and bound algorithms to improve solution quality. In this paper, we present a general GDP model for optimal control of hybrid systems that exhibit both discrete and continuous dynamics. Specifically, we use GDP to formulate a model predictive control (MPC) model for piecewise-affine systems with implicit switching logic. As an example, the GDP-based MPC approach is used as a supervisory control to improve energy efficiency in residential buildings with binary on/off, relay-based thermostats. A simulation study is used to demonstrate the validity of the proposed approach, and the improved solution quality compared to existing MIP-based control approaches.

preprint2020arXiv

Polestar: An Intelligent, Efficient and National-Wide Public Transportation Routing Engine

Public transportation plays a critical role in people's daily life. It has been proven that public transportation is more environmentally sustainable, efficient, and economical than any other forms of travel. However, due to the increasing expansion of transportation networks and more complex travel situations, people are having difficulties in efficiently finding the most preferred route from one place to another through public transportation systems. To this end, in this paper, we present Polestar, a data-driven engine for intelligent and efficient public transportation routing. Specifically, we first propose a novel Public Transportation Graph (PTG) to model public transportation system in terms of various travel costs, such as time or distance. Then, we introduce a general route search algorithm coupled with an efficient station binding method for efficient route candidate generation. After that, we propose a two-pass route candidate ranking module to capture user preferences under dynamic travel situations. Finally, experiments on two real-world data sets demonstrate the advantages of Polestar in terms of both efficiency and effectiveness. Indeed, in early 2019, Polestar has been deployed on Baidu Maps, one of the world's largest map services. To date, Polestar is servicing over 330 cities, answers over a hundred millions of queries each day, and achieves substantial improvement of user click ratio.

preprint2019arXiv

Compressive spectral imaging based on hexagonal blue noise coded apertures

Coded aperture snapshot spectral imager (CASSI) is a computational imaging system that acquires a three dimensional (3D) spectral data cube by single or a few two dimensional (2D) measurements. Binary random coded apertures with square pixels are primarily implemented in CASSI systems to modulate the spectral images in spatial domain. The design and optimization of coded apertures was shown to improve the imaging performance of these systems significantly. This work proposes a different approach to code design. Instead of traditional squared tiled coded elements, hexagonal tiled elements are used. The dislocation between the binary hexagonal pixels on coded apertures and the square pixels on detector introduces equivalent grey-scale spatial modulation to increase the degrees of freedom in the sensing matrix, thus further improving the spectral imaging performance. Then, this paper presents an optimal structure under the criterion of satisfying the restricted isometry property (RIP) with high probability, coined blue noise (BN) coded apertures. In addition, this paper studies and verifies the proposed hexagonal blue noise coded aperture method on a general CASSI system, where the resolution of the coded aperture is equivalent to that of the detector. Based on the RIP criterion, this paper theoretically proves the superiority of the hexagonal blue noise coded aperture over the traditional random coded aperture with square lattice.