Paper detail

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXivOpen access

V Team Wenyi Hong Wenmeng Yu Xiaotao Gu Guo Wang Guobing Gan Haomiao Tang Jiale Cheng Ji Qi Junhui Ji Lihang Pan Shuaiqi Duan Weihan Wang Yan Wang Yean Cheng Zehai He Zhe Su Zhen Yang Ziyang Pan Aohan Zeng Baoxu Wang Bin Chen Boyan Shi Changyu Pang Chenhui Zhang Da Yin Fan Yang Guoqing Chen Haochen Li Jiale Zhu Jiali Chen Jiaxing Xu Jiazheng Xu Jing Chen Jinghao Lin Jinhao Chen Jinjiang Wang Junjie Chen Leqi Lei Letian Gong Leyi Pan Mingdao Liu Mingde Xu Mingzhi Zhang Qinkai Zheng Ruiliang Lyu Shangqin Tu Sheng Yang Shengbiao Meng Shi Zhong Shiyu Huang Shuyuan Zhao Siyan Xue Tianshu Zhang Tianwei Luo Tianxiang Hao Tianyu Tong Wei Jia Wenkai Li Xiao Liu Xiaohan Zhang Xin Lyu Xinyu Zhang Xinyue Fan Xuancheng Huang Yadong Xue Yanfeng Wang Yanling Wang Yanzi Wang Yifan An Yifan Du Yiheng Huang Yilin Niu Yiming Shi Yu Wang Yuan Wang Yuanchang Yue Yuchen Li Yusen Liu Yutao Zhang Yuting Wang Yuxuan Zhang Zhao Xue Zhengxiao Du Zhenyu Hou Zihan Wang Peng Zhang Debing Liu Bin Xu Juanzi Li Minlie Huang Yuxiao Dong Jie Tang

Computer Vision Machine Learning Artificial Intelligence

Open graph Reviews Discussion

Signal facts

What is known right now

Open access93 authors3 topics

Imported metadata coverageMissing code, dataset, citation and institution fields are tracked without dominating the paper.Details

Citations: 0Reviews: 0Saves: 0Code: not linkedDataset: not linkedInstitutions: 0

Next steps

Decide what to do with this paper

Like0 Dislike0Score 0

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Save to reading list0

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

Institutions

No institution affiliation has been imported for this paper yet.

Add specific reaction

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

What is known right now

Decide what to do with this paper

Keep the important context close to the paper

Authors

Institutions

Research map

Building this map preview

0 review(s)

0 comment(s)