Source author record

Yutao Zhang

Yutao Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Social and Information Networks Artificial Intelligence astro-ph.HE Machine Learning physics.soc-ph

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2020arXiv

A framework for constructing a huge name disambiguation dataset: algorithms, visualization and human collaboration

We present a manually-labeled Author Name Disambiguation(AND) Dataset called WhoisWho, which consists of 399,255 documents and 45,187 distinct authors with 421 ambiguous author names. To label such a great amount of AND data of high accuracy, we propose a novel annotation framework where the human and computer collaborate efficiently and precisely. Within the framework, we also propose an inductive disambiguation model to classify whether two documents belong to the same author. We evaluate the proposed method and other state-of-the-art disambiguation methods on WhoisWho. The experiment results show that: (1) Our model outperforms other disambiguation algorithms on this challenging benchmark. (2) The AND problem still remains largely unsolved and requires more in-depth research. We believe that such a large-scale benchmark would bring great value for the author name disambiguation task. We also conduct several experiments to prove our annotation framework could assist annotators to make accurate results efficiently and eliminate wrong label problems made by human annotators effectively.

preprint2020arXiv

Comparison between $Fermi$ Detected and non-$Fermi$ Detected Superluminal Sources

Active galactic nuclei (AGNs) have been attracting research attention due to their special observable properties. Specifically, a majority of AGNs are detected by Fermi-LAT missions, but not by Fermi-LAT, which raises the question of whether any differences exist between the two. To answer this issue, we compile a sample of 291 superluminal AGNs (189 FDSs and 102 non-FDSs) from available multi-wavelength radio, optical, and X-ray (or even $γ$-ray) data and Doppler factors and proper motion ($μ$) (or apparent velocity ($β_{\rm{app}}$)); calculated the apparent velocity from their proper motion, Lorentz factor ($Γ$), viewing angle ($ϕ$) and co-moving viewing angle ($ϕ_{co}$) for the sources with available Doppler factor ($δ$); and performed some statistical analyses for both types. Our study indicated that1. In terms of average values, FDSs have higher proper motions ($μ$), apparent velocities ($β_{\rm app}$), Doppler factor ($δ$), Lorentz factor ($Γ$), and smaller viewing angle ($ϕ$). Nevertheless, there is no clear difference in co-moving viewing angles ($ϕ_{\rm co}$).

preprint2014arXiv

Social Network Integration: Towards Constructing the Social Graph

In this work, we formulate the problem of social network integration. It takes multiple observed social networks as input and returns an integrated global social graph where each node corresponds to a real person. The key challenge for social network integration is to discover the correspondences or interlinks across different social networks. We engaged an in-depth analysis across three online social networks, AMiner, Linkedin, and Videolectures in order to address what reveals users' social identity, whether the social factors consistent across different social networks and how we can leverage these information to perform integration. We proposed a unified framework for the social network integration task. It crawls data from multiple social networks and further discovers accounts correspond to the same real person from the obtained networks. We use a probabilistic model to determine such correspondence, it incorporates features like the consistency of social status and social ties across different, as well as one-to-one mapping constraint and logical transitivity to jointly make the prediction. Empirical experiments verify the effectiveness of our method.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint