Source author record

Qi Xu

Qi Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.AS Machine Learning Sound astro-ph.GA econ.EM math.ST Methodology Multiagent Systems Neural and Evolutionary Computing physics.soc-ph Statistics Theory

Catalog footprint

What is connected

10works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Masked Generative Transformer Is What You Need for Image Editing

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized token-prediction paradigm naturally confines changes to intended regions. We present EditMGT, an MGT-based editing framework that is the first of its kind. Our approach employs multi-layer attention consolidation to aggregate cross-attention maps into precise edit localization signals, and region-hold sampling to explicitly prevent token flipping in non-target areas. To support training, we construct CrispEdit-2M, a 2M-sample high-resolution (>1024) editing dataset spanning seven categories. With only 960M parameters, EditMGT achieves state-of-the-art image similarity on multiple benchmarks while delivering 6x faster editing, demonstrating that MGTs offer a compelling alternative to diffusion-based editing.

preprint2024arXiv

Enhancing Adaptive History Reserving by Spiking Convolutional Block Attention Module in Recurrent Neural Networks

Spiking neural networks (SNNs) serve as one type of efficient model to process spatio-temporal patterns in time series, such as the Address-Event Representation data collected from Dynamic Vision Sensor (DVS). Although convolutional SNNs have achieved remarkable performance on these AER datasets, benefiting from the predominant spatial feature extraction ability of convolutional structure, they ignore temporal features related to sequential time points. In this paper, we develop a recurrent spiking neural network (RSNN) model embedded with an advanced spiking convolutional block attention module (SCBAM) component to combine both spatial and temporal features of spatio-temporal patterns. It invokes the history information in spatial and temporal channels adaptively through SCBAM, which brings the advantages of efficient memory calling and history redundancy elimination. The performance of our model was evaluated in DVS128-Gesture dataset and other time-series datasets. The experimental results show that the proposed SRNN-SCBAM model makes better use of the history information in spatial and temporal dimensions with less memory space, and achieves higher accuracy compared to other models.

preprint2023arXiv

The Arrow of Time in Music -- Revisiting the Temporal Structure of Music with Distinguishability and Unique Orientability as the Anchor Point

Driven by the term "the arrow of time" as a general topic, the article develops a musical discussion by referring to the etymological origin of the term: philosophy (epistemology) and physics (thermodynamics). In particular, the article explores two specific conditions: distinguishability and unique orientability, from which the article derives respective musical propositions and case studies. For the distinguishability condition, the article focuses on the "recurrence" in music and tries to interpret Bach's Christmas Oratorio from the perspective of "birth/resurrection". For the unique orientability condition, the article discusses the process of delaying the climax, thereby proposing "AB-AAB left-replication" model, implying an organicist view by treating the temporal structure of music (e.g. form) as the product of a dynamic process: organic growth.

preprint2022arXiv

Deep Auto-encoder with Neural Response

Artificial neural network (ANN) is a versatile tool to study the neural representation in the ventral visual stream, and the knowledge in neuroscience in return inspires ANN models to improve performance in the task. However, it is still unclear how to merge these two directions into a unified framework. In this study, we propose an integrated framework called Deep Autoencoder with Neural Response (DAE-NR), which incorporates information from ANN and the visual cortex to achieve better image reconstruction performance and higher neural representation similarity between biological and artificial neurons. The same visual stimuli (i.e., natural images) are input to both the mice brain and DAE-NR. The encoder of DAE-NR jointly learns the dependencies from neural spike encoding and image reconstruction. For the neural spike encoding task, the features derived from a specific hidden layer of the encoder are transformed by a mapping function to predict the ground-truth neural response under the constraint of image reconstruction. Simultaneously, for the image reconstruction task, the latent representation obtained by the encoder is assigned to a decoder to restore the original image under the guidance of neural information. In DAE-NR, the learning process of encoder, mapping function and decoder are all implicitly constrained by these two tasks. Our experiments demonstrate that if and only if with the joint learning, DAE-NRs can improve the performance of visual image reconstruction and increase the representation similarity between biological neurons and artificial neurons. The DAE-NR offers a new perspective on the integration of computer vision and neuroscience.

preprint2022arXiv

The Musical Arrow of Time -- The Role of Temporal Asymmetry in Music and Its Organicist Implications

Adopting a performer-centric perspective, we frequently encounter two statements: "music flows", and "music is life-like". This dissertation builds on top of the two statements above, resulting in an exploration of the role of temporal asymmetry in music (generalizing "music flows") and its relation to the idea of organicism (generalizing "music is life-like"). We focus on two aspects of temporal asymmetry. The first aspect concerns the vastly different epistemic mechanisms with which we obtain knowledge of the past and the future. A particular musical consequence follows: recurrence. The epistemic difference between the past and the future shapes our experience and interpretation of recurring events in music. The second aspect concerns the arrow of time: the unambiguous ordering imposed on temporal events gives rise to the a priori pointedness of time, rendering time asymmetrical and irreversible. A discussion on thermodynamics informs us musically: the arrow of time effectuates itself in musical forms by delaying the placement of the climax. Organicism serves as a mediating topic, engaging with the concept of life as in organisms. On the one hand, organicism is related to temporal asymmetry in science via a thermodynamical interpretation of life as entropy-reducing entities. On the other hand, organicism is a topic native to music via the universally acknowledged artistic idea that music should be interpreted as a vital force possessing volitional power. With organicism as a mediator, we better understand the role of temporal asymmetry in music. In particular, we view musical form as a process of expansion and elaboration analogous to organic growth. Finally, we present an organicist interpretation of delaying the climax: viewing musical form as the result of organic growth, the arrow of time translates to a preference for prepending structure over appending structure.

preprint2021arXiv

The Stellar "Snake" I: Whole Structure and Properties

To complement our previous discovery of the young snake-like structure in the solar neighborhood and reveal the structure's full extent, we build two samples of stars within the Snake and its surrounding territory from {\tt Gaia EDR3}. With the friends-of-friends algorithm, we identify 2694 and 9615 Snake member candidates from the two samples. Thirteen open clusters are embedded in these member candidates. By combining the spectroscopic data from multiple surveys, we investigate the comprehensive properties of the candidates and find that they \thj{are very likely to} belong to one sizable structure, since most of the components are well bridged in their spatial distributions, and follow a single stellar population with an age of $30-40$\,Myr and solar metallicity. This sizable structure is best explained as hierarchically primordial, and probably formed from a filamentary giant molecular cloud with unique formation history in localized regions. To analyze the dynamics of the Snake, we divide the structure into five groups according to their tangential velocities; we find that the groups are expanding at a coherent rate ($κ_X\sim3.0\,\times10^{-2}\,\rm km\,s^{-1}\,pc^{-1}$) along the length of the structure ($X$-direction). \thj{The corresponding expansion age ($τ\sim33$\,Myr) is highly consistent with the age of the Snake}. With over ten thousand member stars, the Snake is an ideal laboratory to study nearby coeval stellar formation, stellar physics, and environmental evolution over a large spatial extent.

preprint2020arXiv

Covariate Distribution Balance via Propensity Scores

This paper proposes new estimators for the propensity score that aim to maximize the covariate distribution balance among different treatment groups. Heuristically, our proposed procedure attempts to estimate a propensity score model by making the underlying covariate distribution of different treatment groups as close to each other as possible. Our estimators are data-driven, do not rely on tuning parameters such as bandwidths, admit an asymptotic linear representation, and can be used to estimate different treatment effect parameters under different identifying assumptions, including unconfoundedness and local treatment effects. We derive the asymptotic properties of inverse probability weighted estimators for the average, distributional, and quantile treatment effects based on the proposed propensity score estimator and illustrate their finite sample performance via Monte Carlo simulations and two empirical applications.

preprint2020arXiv

SketchyCOCO: Image Generation from Freehand Scene Sketches

We introduce the first method for automatic image generation from scene-level freehand sketches. Our model allows for controllable image generation by specifying the synthesis goal via freehand sketches. The key contribution is an attribute vector bridged Generative Adversarial Network called EdgeGAN, which supports high visual-quality object-level image content generation without using freehand sketches as training data. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution. We validate our approach on the tasks of both object-level and scene-level image generation on SketchyCOCO. Through quantitative, qualitative results, human evaluation and ablation studies, we demonstrate the method's capacity to generate realistic complex scene-level images from various freehand sketches.

preprint2016arXiv

Network structure of subway passenger flows

The results of transportation infrastructure network analyses have been used to analyze complex networks in a topological context. However, most modeling approaches, including those based on complex network theory, do not fully account for real-life traffic patterns and may provide an incomplete view of network functions. This study utilizes trip data obtained from the Beijing Subway System to characterize individual passenger movement patterns. A directed weighted passenger flow network was constructed from the subway infrastructure network topology by incorporating trip data. The passenger flow networks exhibit several properties that can be characterized by power-law distributions based on flow size, and log-logistic distributions based on the fraction of boarding and departing passengers. The study also characterizes the temporal patterns of in-transit and waiting passengers and provides a hierarchical clustering structure for passenger flows. This hierarchical flow organization varies in the spatial domain. Ten cluster groups were identified, indicating a hierarchical urban polycentric structure composed of large concentrated flows at urban activity centers. These empirical findings provide insights regarding urban human mobility patterns within a large subway network.

preprint2014arXiv

Effects of Crowding Perception on Self-organized Pedestrian Flows Using Adaptive Agent-based Model

Pedestrian behavior has much more complicated characteristics in a dense crowd and thus attracts the widespread interest of scientists and engineers. However, even successful modeling approaches such as pedestrian models based on particle systems are still not fully considered the perceptive mechanism underlying collective pedestrian behavior. This paper extends a behavioral heuristics-based pedestrian model to an adaptive agent-based model, which explicitly considers the crowding effect of neighboring individuals and perception anisotropy on the representation of a pedestrians visual information. The adaptive agents with crowding perception are constructed to investigate complex, selforganized collective dynamics of pedestrian motion. The proposed model simulates selforganized pedestrian flows in good quantitative agreement with empirical data. The selforganized phenomena include lane formation in bidirectional flow and fundamental diagrams of unidirectional flow. Simulation results show that the emergence of lane formation in bidirectional flow can be well reproduced. To investigate this further, increasing view distance has a significant effect on reducing the number of lanes, increasing lane width, and stabilizing the self-organized lanes. The paper also discusses phase transitions of fundamental diagrams of pedestrian crowds with unidirectional flow. It is found that the heterogeneity of how pedestrians perceive crowding in the population has a remarkable impact on the flow quality, which results in the buildup of congestion and rapidly decreases the efficiency of pedestrian flows. It also indicates that the concept of heterogeneity may be used to explain the instability of phase transitions.