Source author record

Zhe Su

Zhe Su appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Neural and Evolutionary Computing Artificial Intelligence Computer Vision Human-Computer Interaction math.DG

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXiv

Unsupervised Graph Modeling for Anomaly Detection in Accounting Subject Relationships

This paper addresses the problem of anomaly detection in accounting subject association structures, proposing a structured modeling and unsupervised discriminant framework based on graph neural networks. This framework is used to mine stable correspondences between subjects and identify structural deviations from general ledger details and voucher entries. The method first abstracts accounting subjects as graph nodes, and the co-occurrence and debit/credit correspondence of subjects in the same business record are abstracted as weighted edges. The edge weights are characterized by statistical measures such as co-occurrence frequency or amount aggregation, thus forming a period-level accounting subject association graph. In the representation learning stage, a message passing mechanism is used to fuse the node's own attributes and neighborhood context to obtain node embeddings containing structural information. In the anomaly detection stage, the rationality of subject pair connections is estimated through a relation reconstruction decoder, and edge-level anomaly scores are defined based on the degree of deviation in reconstruction probabilities. These scores are then aggregated to obtain node-level risk ranking and local anomaly localization. This framework can simultaneously capture local substructure anomalies and cross-community anomaly connections without relying on anomaly labeling, outputting traceable subject pair risk clues. Comparative experiments demonstrate more stable comprehensive discriminant capabilities and higher top-ranking accuracy.

preprint2025arXiv

PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries

Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot transformations, or multi-library scenarios, remains underexplored. To address this gap, we introduce PlotGen-Bench, a comprehensive benchmark for evaluating plot-to-code generation under realistic and complex visualization scenarios. The benchmark spans 9 major categories, 30 subcategories, and 3 core tasks-plot replication, plot transformation, and multi-library generation, covering both 2D, 3D and animated plots across 5 widely used visualization libraries. Through systematic evaluation of state-of-the-art open- and closed-source VLMs, we find that open-source models still lag considerably behind in visual fidelity and semantic consistency, despite achieving comparable code executability. Moreover, all models exhibit substantial degradation on reasoning-intensive tasks such as chart type conversion and animation generation. PlotGen-Bench establishes a rigorous foundation for advancing research toward more capable and reliable VLMs for visualization authoring and code synthesis, with all data and code available at https://plotgen.github.io.

preprint2022arXiv

A hardware-software co-design approach to minimize the use of memory resources in multi-core neuromorphic processors

Both in electronics and biology, physical implementations of neural networks have severe energy and memory constraints. We propose a hardware-software co-design approach for minimizing the use of memory resources in multi-core neuromorphic processors, by taking inspiration from biological neural networks. We use this approach to design new routing schemes optimized for small-world networks and to provide guidelines for designing novel application-specific multi-core neuromorphic chips. Starting from the hierarchical routing scheme proposed, we present a hardware-aware placement algorithm that optimizes the allocation of resources for arbitrary network models. We validate the algorithm with a canonical small-world network and present preliminary results for other networks derived from it.

preprint2022arXiv

Cortical-inspired placement and routing: minimizing the memory resources in multi-core neuromorphic processors

Brain-inspired event-based neuromorphic processing systems have emerged as a promising technology in particular for bio-medical circuits and systems. However, both neuromorphic and biological implementations of neural networks have critical energy and memory constraints. To minimize the use of memory resources in multi-core neuromorphic processors, we propose a network design approach inspired by biological neural networks. We use this approach to design a new routing scheme optimized for small-world networks and, at the same time, to present a hardware-aware placement algorithm that optimizes the allocation of resources for small-world network models. We validate the algorithm with a canonical small-world network and present preliminary results for other networks derived from it

preprint2020arXiv

A diffeomorphism-invariant metric on the space of vector-valued one-forms

In this article we introduce a diffeomorphism-invariant Riemannian metric on the space of vector valued one-forms. The particular choice of metric is motivated by potential future applications in the field of functional data and shape analysis and by connections to the Ebin metric on the space of all Riemannian metrics. In the present work we calculate the geodesic equations and obtain an explicit formula for the solutions to the corresponding initial value problem. Using this we show that it is a geodesically and metrically incomplete space and study the existence of totally geodesic subspaces. Furthermore, we calculate the sectional curvature and observe that, depending on the dimension of the base manifold and the target space, it either has a semidefinite sign or admits both signs.