Researcher profile

Xiaobo Hu

Xiaobo Hu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

BabyVision: Visual Reasoning Beyond Language

While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual understanding. We uncovered a crucial fact: state-of-the-art MLLMs consistently fail on basic visual tasks that humans, even 3-year-olds, can solve effortlessly. To systematically investigate this gap, we introduce BabyVision, a benchmark designed to assess core visual abilities independent of linguistic knowledge for MLLMs. BabyVision spans a wide range of tasks, with 388 items divided into 22 subclasses across four key categories. Empirical results and human evaluation reveal that leading MLLMs perform significantly below human baselines. Gemini3-Pro-Preview scores 49.7, lagging behind 6-year-old humans and falling well behind the average adult score of 94.1. These results show despite excelling in knowledge-heavy evaluations, current MLLMs still lack fundamental visual primitives. Progress in BabyVision represents a step toward human-level visual perception and reasoning capabilities. We also explore solving visual reasoning with generation models by proposing BabyVision-Gen and automatic evaluation toolkit. Our code and benchmark data are released at https://github.com/UniPat-AI/BabyVision for reproduction.

preprint2021arXiv

Generation and characterization of complex vector modes with digital micromirror devices

Complex vector light modes with a spatial variant polarization distribution have become topical of late, enabling the development of novel applications in numerous research fields. Key to this is the remarkable similarities they hold with quantum entangled states, which arises from the non-separability between the spatial and polarisation degrees of freedom (DoF). As such, the demand for diversification of generation methods and characterization techniques have increased dramatically. Here we put forward a comprehensive tutorial about the use of DMDs in the generation and characterization of vector modes, providing details on the implementation of techniques that fully exploits the unsurpassed advantage of Digital Micromirrors Devices (DMDs), such as their high refresh rates and polarisation independence. We start by briefly describing the operating principles of DMD and follow with a thorough explanation of some of the methods to shape arbitrary vector modes. Finally, we describe some techniques aiming at the real-time characterization of vector beams. This tutorial highlights the value of DMDs as an alternative tool for the generation and characterization of complex vector light fields, of great relevance in a wide variety of applications.