Source author record

Xu Ma

Xu Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.soft cond-mat.mtrl-sci Machine Learning Artificial Intelligence Computation and Language cond-mat.mes-hall eess.IV math.OC physics.comp-ph physics.data-an

Catalog footprint

What is connected

15works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning

Advanced chart question answering requires both precise perception of small visual elements and multi-step reasoning across several subplots. While existing MLLMs are strong at understanding single plots, they often struggle with multi-step reasoning across multiple subplots. We propose HierVA, a hierarchical visual agent framework for chart reasoning that iteratively constructs and updates a working context in a joint image--text space. A high-level manager generates plans and maintains a compact context containing only key information, while specialized workers perform reasoning, gather evidence, and return results. In particular, the agent maintains separate visual and textual contexts, using a zoom-in tool to restrict the visual context. Experiments on the CharXiv reasoning subset demonstrate consistent improvements over strong multimodal baselines, and ablation studies verify that hierarchical architecture, scoped visual context, and distilled context contribute complementary gains.

preprint2026arXiv

Segmentation-Driven Monocular Shape from Polarization based on Physical Model

Monocular shape-from-polarization (SfP) leverages the intrinsic relationship between light polarization properties and surface geometry to recover surface normals from single-view polarized images, providing a compact and robust approach for three-dimensional (3D) reconstruction. Despite its potential, existing monocular SfP methods suffer from azimuth angle ambiguity, an inherent limitation of polarization analysis, that severely compromises reconstruction accuracy and stability. This paper introduces a novel segmentation-driven monocular SfP (SMSfP) framework that reformulates global shape recovery into a set of local reconstructions over adaptively segmented convex sub-regions. Specifically, a polarization-aided adaptive region growing (PARG) segmentation strategy is proposed to decompose the global convexity assumption into locally convex regions, effectively suppressing azimuth ambiguities and preserving surface continuity. Furthermore, a multi-scale fusion convexity prior (MFCP) constraint is developed to ensure local surface consistency and enhance the recovery of fine textural and structural details. Extensive experiments on both synthetic and real-world datasets validate the proposed approach, showing significant improvements in disambiguation accuracy and geometric fidelity compared with existing physics-based monocular SfP techniques.

preprint2022arXiv

Attention-based Cross-Layer Domain Alignment for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to learn transferable knowledge from a labeled source domain and adapts a trained model to an unlabeled target domain. To bridge the gap between source and target domains, one prevailing strategy is to minimize the distribution discrepancy by aligning their semantic features extracted by deep models. The existing alignment-based methods mainly focus on reducing domain divergence in the same model layer. However, the same level of semantic information could distribute across model layers due to the domain shifts. To further boost model adaptation performance, we propose a novel method called Attention-based Cross-layer Domain Alignment (ACDA), which captures the semantic relationship between the source and target domains across model layers and calibrates each level of semantic information automatically through a dynamic attention mechanism. An elaborate attention mechanism is designed to reweight each cross-layer pair based on their semantic similarity for precise domain alignment, effectively matching each level of semantic information during model adaptation. Extensive experiments on multiple benchmark datasets consistently show that the proposed method ACDA yields state-of-the-art performance.

preprint2022arXiv

Label-Efficient Domain Generalization via Collaborative Exploration and Generalization

Considerable progress has been made in domain generalization (DG) which aims to learn a generalizable model from multiple well-annotated source domains to unknown target domains. However, it can be prohibitively expensive to obtain sufficient annotation for source datasets in many real scenarios. To escape from the dilemma between domain generalization and annotation costs, in this paper, we introduce a novel task named label-efficient domain generalization (LEDG) to enable model generalization with label-limited source domains. To address this challenging task, we propose a novel framework called Collaborative Exploration and Generalization (CEG) which jointly optimizes active exploration and semi-supervised generalization. Specifically, in active exploration, to explore class and domain discriminability while avoiding information divergence and redundancy, we query the labels of the samples with the highest overall ranking of class uncertainty, domain representativeness, and information diversity. In semi-supervised generalization, we design MixUp-based intra- and inter-domain knowledge augmentation to expand domain knowledge and generalize domain invariance. We unify active exploration and semi-supervised generalization in a collaborative way and promote mutual enhancement between them, boosting model generalization with limited annotation. Extensive experiments show that CEG yields superior generalization performance. In particular, CEG can even use only 5% data annotation budget to achieve competitive results compared to the previous DG methods with fully labeled data on PACS dataset.

preprint2022arXiv

Predicting Peak Day and Peak Hour of Electricity Demand with Ensemble Machine Learning

Battery energy storage systems can be used for peak demand reduction in power systems, leading to significant economic benefits. Two practical challenges are 1) accurately determining the peak load days and hours and 2) quantifying and reducing uncertainties associated with the forecast in probabilistic risk measures for dispatch decision-making. In this study, we develop a supervised machine learning approach to generate 1) the probability of the next operation day containing the peak hour of the month and 2) the probability of an hour to be the peak hour of the day. Guidance is provided on the preparation and augmentation of data as well as the selection of machine learning models and decision-making thresholds. The proposed approach is applied to the Duke Energy Progress system and successfully captures 69 peak days out of 72 testing months with a 3% exceedance probability threshold. On 90% of the peak days, the actual peak hour is among the 2 hours with the highest probabilities.

preprint2022arXiv

Towards Layer-wise Image Vectorization

Image rasterization is a mature technique in computer graphics, while image vectorization, the reverse path of rasterization, remains a major challenge. Recent advanced deep learning-based models achieve vectorization and semantic interpolation of vector graphs and demonstrate a better topology of generating new figures. However, deep models cannot be easily generalized to out-of-domain testing data. The generated SVGs also contain complex and redundant shapes that are not quite convenient for further editing. Specifically, the crucial layer-wise topology and fundamental semantics in images are still not well understood and thus not fully explored. In this work, we propose Layer-wise Image Vectorization, namely LIVE, to convert raster images to SVGs and simultaneously maintain its image topology. LIVE can generate compact SVG forms with layer-wise structures that are semantically consistent with human perspective. We progressively add new bezier paths and optimize these paths with the layer-wise framework, newly designed loss functions, and component-wise path initialization technique. Our experiments demonstrate that LIVE presents more plausible vectorized forms than prior works and can be generalized to new images. With the help of this newly learned topology, LIVE initiates human editable SVGs for both designers and other downstream applications. Codes are made available at https://github.com/Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization.

preprint2020arXiv

DCANet: Learning Connected Attentions for Convolutional Neural Networks

While self-attention mechanism has shown promising results for many vision tasks, it only considers the current features at a time. We show that such a manner cannot take full advantage of the attention mechanism. In this paper, we present Deep Connected Attention Network (DCANet), a novel design that boosts attention modules in a CNN model without any modification of the internal structure. To achieve this, we interconnect adjacent attention blocks, making information flow among attention blocks possible. With DCANet, all attention blocks in a CNN model are trained jointly, which improves the ability of attention learning. Our DCANet is generic. It is not limited to a specific attention module or base network architecture. Experimental results on ImageNet and MS COCO benchmarks show that DCANet consistently outperforms the state-of-the-art attention modules with a minimal additional computational overhead in all test cases. All code and models are made publicly available.

preprint2020arXiv

Model Predictive Control of Discrete-Continuous Energy Systems via Generalized Disjunctive Programming

Generalized Disjunctive Programming (GDP) provides an alternative framework to model optimization problems with both discrete and continuous variables. The key idea behind GDP involves the use of logical disjunctions to represent discrete decisions in the continuous space, and logical propositions to denote algebraic constraints in the discrete space. Compared to traditional mixed-integer programming (MIP), the inherent logic structure in GDP yields tighter relaxations that are exploited by global branch and bound algorithms to improve solution quality. In this paper, we present a general GDP model for optimal control of hybrid systems that exhibit both discrete and continuous dynamics. Specifically, we use GDP to formulate a model predictive control (MPC) model for piecewise-affine systems with implicit switching logic. As an example, the GDP-based MPC approach is used as a supervisory control to improve energy efficiency in residential buildings with binary on/off, relay-based thermostats. A simulation study is used to demonstrate the validity of the proposed approach, and the improved solution quality compared to existing MIP-based control approaches.

preprint2020arXiv

Polestar: An Intelligent, Efficient and National-Wide Public Transportation Routing Engine

Public transportation plays a critical role in people's daily life. It has been proven that public transportation is more environmentally sustainable, efficient, and economical than any other forms of travel. However, due to the increasing expansion of transportation networks and more complex travel situations, people are having difficulties in efficiently finding the most preferred route from one place to another through public transportation systems. To this end, in this paper, we present Polestar, a data-driven engine for intelligent and efficient public transportation routing. Specifically, we first propose a novel Public Transportation Graph (PTG) to model public transportation system in terms of various travel costs, such as time or distance. Then, we introduce a general route search algorithm coupled with an efficient station binding method for efficient route candidate generation. After that, we propose a two-pass route candidate ranking module to capture user preferences under dynamic travel situations. Finally, experiments on two real-world data sets demonstrate the advantages of Polestar in terms of both efficiency and effectiveness. Indeed, in early 2019, Polestar has been deployed on Baidu Maps, one of the world's largest map services. To date, Polestar is servicing over 330 cities, answers over a hundred millions of queries each day, and achieves substantial improvement of user click ratio.

preprint2019arXiv

Compressive spectral imaging based on hexagonal blue noise coded apertures

Coded aperture snapshot spectral imager (CASSI) is a computational imaging system that acquires a three dimensional (3D) spectral data cube by single or a few two dimensional (2D) measurements. Binary random coded apertures with square pixels are primarily implemented in CASSI systems to modulate the spectral images in spatial domain. The design and optimization of coded apertures was shown to improve the imaging performance of these systems significantly. This work proposes a different approach to code design. Instead of traditional squared tiled coded elements, hexagonal tiled elements are used. The dislocation between the binary hexagonal pixels on coded apertures and the square pixels on detector introduces equivalent grey-scale spatial modulation to increase the degrees of freedom in the sensing matrix, thus further improving the spectral imaging performance. Then, this paper presents an optimal structure under the criterion of satisfying the restricted isometry property (RIP) with high probability, coined blue noise (BN) coded apertures. In addition, this paper studies and verifies the proposed hexagonal blue noise coded aperture method on a general CASSI system, where the resolution of the coded aperture is equivalent to that of the detector. Based on the RIP criterion, this paper theoretically proves the superiority of the hexagonal blue noise coded aperture over the traditional random coded aperture with square lattice.

preprint2013arXiv

Defect annihilation and proliferation in active nematics

Liquid crystals inevitably possess topological defect excitations generated through boundary conditions, applied fields or in quenches to the ordered phase. In equilibrium pairs of defects coarsen and annihilate as the uniform ground state is approached. Here we show that defects in active liquid crystals exhibit profoundly different behavior, depending on the degree of activity and its contractile or extensile character. While contractile systems enhance the annihilation dynamics of passive systems, extensile systems act to drive defects apart so that they swarm around in the manner of topologically well-characterized self-propelled particles. We develop a simple analytical model for the defect dynamics which reproduces the key features of both the numerical solutions and recent experiments on microtuble-kinesin assemblies.

preprint2012arXiv

Planar sheets meet negative curvature liquid interfaces

If an inextensible thin sheet is adhered to a substrate with a negative Gaussian curvature it will experience stress due to geometric frustration. We analyze the consequences of such geometric frustration using analytic arguments and numerical simulations. Both concentric wrinkles and eye-like folds are shown to be compatible with negative curvatures. Which pattern will be realized depends on the curvature of the substrate. We discuss both types of folding patterns and determine the phase diagram governing their appearance.

preprint2012arXiv

The Electric Double Layer Structure Around Charged Spherical Interfaces

We derive a formally simple approximate analytical solution to the Poisson-Boltzmann equation for the spherical system via a geometric mapping. Its regime of applicability in the parameter space of the spherical radius and the surface potential is determined, and its superiority over the linearized solution is demonstrated.

preprint2011arXiv

Molecular Tilt on Monolayer-Protected Nanoparticles

The structure of the tilted phase of monolayer-protected nanoparticles is investigated by means of a simple Ginzburg-Landau model. The theory contains two dimensionless parameters representing the preferential tilt angle and the ratio epsilon between the energy cost due to spatial variations in the tilt of the coating molecules and that of the van der Waals interactions which favors uniform tilt. We analyze the model for both spherical and octahedral particles. On spherical particles, we find a transition from a tilted phase, at small epsilon, to a phase where the molecules spontaneously align along the surface normal and tilt disappears. Octahedral particles have an additional phase at small epsilon characterized by the presence of six topological defects. These defective configurations provide preferred sites for the chemical functionalization of monolayer-protected nanoparticles via place-exchange reactions and their consequent linking to form molecules and bulk materials.

preprint2010arXiv

Effects of Prediction Feedback in Multi-Route Intelligent Traffic Systems

We first study the influence of an efficient feedback strategy named prediction feedback strategy (PFS) based on a multi-route scenario in which dynamic information can be generated and displayed on the board to guide road users to make a choice. In this scenario, our model incorporates the effects of adaptability into the cellular automaton models of traffic flow. Simulation results adopting this optimal information feedback strategy have demonstrated high efficiency in controlling spatial distribution of traffic patterns compared with the other three information feedback strategies, i.e., vehicle number and flux. At the end of this paper, we also discuss in what situation PFS will become invalid in multi-route systems.

Xu Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning

Segmentation-Driven Monocular Shape from Polarization based on Physical Model

Attention-based Cross-Layer Domain Alignment for Unsupervised Domain Adaptation

Label-Efficient Domain Generalization via Collaborative Exploration and Generalization

Predicting Peak Day and Peak Hour of Electricity Demand with Ensemble Machine Learning

Towards Layer-wise Image Vectorization

DCANet: Learning Connected Attentions for Convolutional Neural Networks

Model Predictive Control of Discrete-Continuous Energy Systems via Generalized Disjunctive Programming

Polestar: An Intelligent, Efficient and National-Wide Public Transportation Routing Engine

Compressive spectral imaging based on hexagonal blue noise coded apertures

Defect annihilation and proliferation in active nematics

Planar sheets meet negative curvature liquid interfaces

The Electric Double Layer Structure Around Charged Spherical Interfaces

Molecular Tilt on Monolayer-Protected Nanoparticles

Effects of Prediction Feedback in Multi-Route Intelligent Traffic Systems