Researcher profile

Qi Xu

Qi Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Masked Generative Transformer Is What You Need for Image Editing

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized token-prediction paradigm naturally confines changes to intended regions. We present EditMGT, an MGT-based editing framework that is the first of its kind. Our approach employs multi-layer attention consolidation to aggregate cross-attention maps into precise edit localization signals, and region-hold sampling to explicitly prevent token flipping in non-target areas. To support training, we construct CrispEdit-2M, a 2M-sample high-resolution (>1024) editing dataset spanning seven categories. With only 960M parameters, EditMGT achieves state-of-the-art image similarity on multiple benchmarks while delivering 6x faster editing, demonstrating that MGTs offer a compelling alternative to diffusion-based editing.

preprint2024arXiv

Enhancing Adaptive History Reserving by Spiking Convolutional Block Attention Module in Recurrent Neural Networks

Spiking neural networks (SNNs) serve as one type of efficient model to process spatio-temporal patterns in time series, such as the Address-Event Representation data collected from Dynamic Vision Sensor (DVS). Although convolutional SNNs have achieved remarkable performance on these AER datasets, benefiting from the predominant spatial feature extraction ability of convolutional structure, they ignore temporal features related to sequential time points. In this paper, we develop a recurrent spiking neural network (RSNN) model embedded with an advanced spiking convolutional block attention module (SCBAM) component to combine both spatial and temporal features of spatio-temporal patterns. It invokes the history information in spatial and temporal channels adaptively through SCBAM, which brings the advantages of efficient memory calling and history redundancy elimination. The performance of our model was evaluated in DVS128-Gesture dataset and other time-series datasets. The experimental results show that the proposed SRNN-SCBAM model makes better use of the history information in spatial and temporal dimensions with less memory space, and achieves higher accuracy compared to other models.

preprint2023arXiv

The Arrow of Time in Music -- Revisiting the Temporal Structure of Music with Distinguishability and Unique Orientability as the Anchor Point

Driven by the term "the arrow of time" as a general topic, the article develops a musical discussion by referring to the etymological origin of the term: philosophy (epistemology) and physics (thermodynamics). In particular, the article explores two specific conditions: distinguishability and unique orientability, from which the article derives respective musical propositions and case studies. For the distinguishability condition, the article focuses on the "recurrence" in music and tries to interpret Bach's Christmas Oratorio from the perspective of "birth/resurrection". For the unique orientability condition, the article discusses the process of delaying the climax, thereby proposing "AB-AAB left-replication" model, implying an organicist view by treating the temporal structure of music (e.g. form) as the product of a dynamic process: organic growth.

preprint2022arXiv

Deep Auto-encoder with Neural Response

Artificial neural network (ANN) is a versatile tool to study the neural representation in the ventral visual stream, and the knowledge in neuroscience in return inspires ANN models to improve performance in the task. However, it is still unclear how to merge these two directions into a unified framework. In this study, we propose an integrated framework called Deep Autoencoder with Neural Response (DAE-NR), which incorporates information from ANN and the visual cortex to achieve better image reconstruction performance and higher neural representation similarity between biological and artificial neurons. The same visual stimuli (i.e., natural images) are input to both the mice brain and DAE-NR. The encoder of DAE-NR jointly learns the dependencies from neural spike encoding and image reconstruction. For the neural spike encoding task, the features derived from a specific hidden layer of the encoder are transformed by a mapping function to predict the ground-truth neural response under the constraint of image reconstruction. Simultaneously, for the image reconstruction task, the latent representation obtained by the encoder is assigned to a decoder to restore the original image under the guidance of neural information. In DAE-NR, the learning process of encoder, mapping function and decoder are all implicitly constrained by these two tasks. Our experiments demonstrate that if and only if with the joint learning, DAE-NRs can improve the performance of visual image reconstruction and increase the representation similarity between biological neurons and artificial neurons. The DAE-NR offers a new perspective on the integration of computer vision and neuroscience.

preprint2022arXiv

The Musical Arrow of Time -- The Role of Temporal Asymmetry in Music and Its Organicist Implications

Adopting a performer-centric perspective, we frequently encounter two statements: "music flows", and "music is life-like". This dissertation builds on top of the two statements above, resulting in an exploration of the role of temporal asymmetry in music (generalizing "music flows") and its relation to the idea of organicism (generalizing "music is life-like"). We focus on two aspects of temporal asymmetry. The first aspect concerns the vastly different epistemic mechanisms with which we obtain knowledge of the past and the future. A particular musical consequence follows: recurrence. The epistemic difference between the past and the future shapes our experience and interpretation of recurring events in music. The second aspect concerns the arrow of time: the unambiguous ordering imposed on temporal events gives rise to the a priori pointedness of time, rendering time asymmetrical and irreversible. A discussion on thermodynamics informs us musically: the arrow of time effectuates itself in musical forms by delaying the placement of the climax. Organicism serves as a mediating topic, engaging with the concept of life as in organisms. On the one hand, organicism is related to temporal asymmetry in science via a thermodynamical interpretation of life as entropy-reducing entities. On the other hand, organicism is a topic native to music via the universally acknowledged artistic idea that music should be interpreted as a vital force possessing volitional power. With organicism as a mediator, we better understand the role of temporal asymmetry in music. In particular, we view musical form as a process of expansion and elaboration analogous to organic growth. Finally, we present an organicist interpretation of delaying the climax: viewing musical form as the result of organic growth, the arrow of time translates to a preference for prepending structure over appending structure.

preprint2021arXiv

The Stellar "Snake" I: Whole Structure and Properties

To complement our previous discovery of the young snake-like structure in the solar neighborhood and reveal the structure's full extent, we build two samples of stars within the Snake and its surrounding territory from {\tt Gaia EDR3}. With the friends-of-friends algorithm, we identify 2694 and 9615 Snake member candidates from the two samples. Thirteen open clusters are embedded in these member candidates. By combining the spectroscopic data from multiple surveys, we investigate the comprehensive properties of the candidates and find that they \thj{are very likely to} belong to one sizable structure, since most of the components are well bridged in their spatial distributions, and follow a single stellar population with an age of $30-40$\,Myr and solar metallicity. This sizable structure is best explained as hierarchically primordial, and probably formed from a filamentary giant molecular cloud with unique formation history in localized regions. To analyze the dynamics of the Snake, we divide the structure into five groups according to their tangential velocities; we find that the groups are expanding at a coherent rate ($κ_X\sim3.0\,\times10^{-2}\,\rm km\,s^{-1}\,pc^{-1}$) along the length of the structure ($X$-direction). \thj{The corresponding expansion age ($τ\sim33$\,Myr) is highly consistent with the age of the Snake}. With over ten thousand member stars, the Snake is an ideal laboratory to study nearby coeval stellar formation, stellar physics, and environmental evolution over a large spatial extent.

preprint2020arXiv

Covariate Distribution Balance via Propensity Scores

This paper proposes new estimators for the propensity score that aim to maximize the covariate distribution balance among different treatment groups. Heuristically, our proposed procedure attempts to estimate a propensity score model by making the underlying covariate distribution of different treatment groups as close to each other as possible. Our estimators are data-driven, do not rely on tuning parameters such as bandwidths, admit an asymptotic linear representation, and can be used to estimate different treatment effect parameters under different identifying assumptions, including unconfoundedness and local treatment effects. We derive the asymptotic properties of inverse probability weighted estimators for the average, distributional, and quantile treatment effects based on the proposed propensity score estimator and illustrate their finite sample performance via Monte Carlo simulations and two empirical applications.

preprint2020arXiv

SketchyCOCO: Image Generation from Freehand Scene Sketches

We introduce the first method for automatic image generation from scene-level freehand sketches. Our model allows for controllable image generation by specifying the synthesis goal via freehand sketches. The key contribution is an attribute vector bridged Generative Adversarial Network called EdgeGAN, which supports high visual-quality object-level image content generation without using freehand sketches as training data. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution. We validate our approach on the tasks of both object-level and scene-level image generation on SketchyCOCO. Through quantitative, qualitative results, human evaluation and ablation studies, we demonstrate the method's capacity to generate realistic complex scene-level images from various freehand sketches.