Researcher profile

Ye Ma

Ye Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders, remain text-pixel mappers -- they employ LLMs merely as text encoders, without leveraging their inherent reasoning capabilities to infer what should be visually depicted given the textual prompt. To move beyond such literal generation, we propose the think-then-generate (T2G) paradigm, where the LLM-based text encoder is encouraged to reason about and rewrite raw user prompts; the states of the rewritten prompts then serve as diffusion conditioning. To achieve this, we first activate the think-then-rewrite pattern of the LLM encoder with a lightweight supervised fine-tuning process. Subsequently, the LLM encoder and diffusion backbone are co-optimized to ensure faithful reasoning about the context and accurate rendering of the semantics via Dual-GRPO. In particular, the text encoder is reinforced using image-grounded rewards to infer and recall world knowledge, while the diffusion backbone is pushed to produce semantically consistent and visually coherent images. Experiments show substantial improvements in factual consistency, semantic alignment, and visual realism across reasoning-based image generation and editing benchmarks, achieving 0.79 on WISE score, nearly on par with GPT-4. Our results constitute a promising step toward next-generation unified models with reasoning, expression, and demonstration capacities.

preprint2025arXiv

Muscle Synergy Patterns During Running: Coordinative Mechanisms From a Neuromechanical Perspective

Running is a fundamental form of human locomotion and a key task for evaluating neuromuscular control and lower-limb coordination. In recent years, muscle synergy analysis based on surface electromyography (sEMG) has become an important approach in this area. This review focuses on muscle synergies during running, outlining core neural control theories and biomechanical optimization hypotheses, summarizing commonly used decomposition methods (e.g., PCA, ICA, FA, NMF) and emerging autoencoder-based approaches. We synthesize findings on the development and evolution of running-related synergies across the lifespan, examine how running surface, speed, foot-strike pattern, fatigue, and performance level modulate synergy patterns, and describe characteristic alterations in populations with knee osteoarthritis, patellofemoral pain, and stroke. Current evidence suggests that the number and basic structure of lower-limb synergies during running are relatively stable, whereas spatial muscle weightings and motor primitives are highly plastic and sensitive to task demands, fatigue, and pathology. However, substantial methodological variability remains in EMG channel selection, preprocessing pipelines, and decomposition algorithms, and direct neurophysiological validation and translational application are still limited. Future work should prioritize standardized processing protocols, integration of multi-source neuromusculoskeletal data, nonlinear modeling, and longitudinal intervention studies to better exploit muscle synergy analysis in sports biomechanics, athletic training, and rehabilitation medicine.

preprint2022arXiv

A Novel Distributed Representation of News (DRNews) for Stock Market Predictions

In this study, a novel Distributed Representation of News (DRNews) model is developed and applied in deep learning-based stock market predictions. With the merit of integrating contextual information and cross-documental knowledge, the DRNews model creates news vectors that describe both the semantic information and potential linkages among news events through an attributed news network. Two stock market prediction tasks, namely the short-term stock movement prediction and stock crises early warning, are implemented in the framework of the attention-based Long Short Term-Memory (LSTM) network. It is suggested that DRNews substantially enhances the results of both tasks comparing with five baselines of news embedding models. Further, the attention mechanism suggests that short-term stock trend and stock market crises both receive influences from daily news with the former demonstrates more critical responses on the information related to the stock market {\em per se}, whilst the latter draws more concerns on the banking sector and economic policies.

preprint2022arXiv

Composition-aware Graphic Layout GAN for Visual-textual Presentation Designs

In this paper, we study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images. We note that image compositions, which contain not only global semantics but also spatial information, would largely affect layout results. Hence, we propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images. To obtain training images from images that already contain manually designed graphic layout data, previous work suggests masking design elements (e.g., texts and embellishments) as model inputs, which inevitably leaves hint of the ground truth. We study the misalignment between the training inputs (with hint masks) and test inputs (without masks), and design a novel domain alignment module (DAM) to narrow this gap. For training, we built a large-scale layout dataset which consists of 60,548 advertising posters with annotated layout information. To evaluate the generated layouts, we propose three novel metrics according to aesthetic intuitions. Through both quantitative and qualitative evaluations, we demonstrate that the proposed model can synthesize high-quality graphic layouts according to image compositions.

preprint2022arXiv

Geometry Aligned Variational Transformer for Image-conditioned Layout Generation

Layout generation is a novel task in computer vision, which combines the challenges in both object localization and aesthetic appraisal, widely used in advertisements, posters, and slides design. An accurate and pleasant layout should consider both the intra-domain relationship within layout elements and the inter-domain relationship between layout elements and the image. However, most previous methods simply focus on image-content-agnostic layout generation, without leveraging the complex visual information from the image. To this end, we explore a novel paradigm entitled image-conditioned layout generation, which aims to add text overlays to an image in a semantically coherent manner. Specifically, we propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. Subsequently, we take them as building blocks of conditional variational autoencoder (CVAE), which demonstrates appealing diversity. Second, in order to alleviate the gap between layout elements domain and visual domain, we design a Geometry Alignment module, in which the geometric information of the image is aligned with the layout representation. In addition, we construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations. Experimental results show that our model can adaptively generate layouts in the non-intrusive area of the image, resulting in a harmonious layout design.

preprint2022arXiv

Integrated Node Encoder for Labelled Textual Networks

Voluminous works have been implemented to exploit content-enhanced network embedding models, with little focus on the labelled information of nodes. Although TriDNR leverages node labels by treating them as node attributes, it fails to enrich unlabelled node vectors with the labelled information, which leads to the weaker classification result on the test set in comparison to existing unsupervised textual network embedding models. In this study, we design an integrated node encoder (INE) for textual networks which is jointly trained on the structure-based and label-based objectives. As a result, the node encoder preserves the integrated knowledge of not only the network text and structure, but also the labelled information. Furthermore, INE allows the creation of label-enhanced vectors for unlabelled nodes by entering their node contents. Our node embedding achieves state-of-the-art performances in the classification task on two public citation networks, namely Cora and DBLP, pushing benchmarks up by 10.0\% and 12.1\%, respectively, with the 70\% training ratio. Additionally, a feasible solution that generalizes our model from textual networks to a broader range of networks is proposed.

preprint2022arXiv

Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization

In comparison to single-document summarization, abstractive Multi-Document Summarization (MDS) brings challenges on the representation and coverage of its lengthy and linked sources. This study develops a Parallel Hierarchical Transformer (PHT) with attention alignment for MDS. By incorporating word- and paragraph-level multi-head attentions, the hierarchical architecture of PHT allows better processing of dependencies at both token and document levels. To guide the decoding towards a better coverage of the source documents, the attention-alignment mechanism is then introduced to calibrate beam search with predicted optimal attention distributions. Based on the WikiSum data, a comprehensive evaluation is conducted to test improvements on MDS by the proposed architecture. By better handling the inner- and cross-document information, results in both ROUGE and human evaluation suggest that our hierarchical model generates summaries of higher quality relative to other Transformer-based baselines at relatively low computational cost.