Source author record

Xiaobo Hu

Xiaobo Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision Cryptography and Security physics.optics

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BabyVision: Visual Reasoning Beyond Language

While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual understanding. We uncovered a crucial fact: state-of-the-art MLLMs consistently fail on basic visual tasks that humans, even 3-year-olds, can solve effortlessly. To systematically investigate this gap, we introduce BabyVision, a benchmark designed to assess core visual abilities independent of linguistic knowledge for MLLMs. BabyVision spans a wide range of tasks, with 388 items divided into 22 subclasses across four key categories. Empirical results and human evaluation reveal that leading MLLMs perform significantly below human baselines. Gemini3-Pro-Preview scores 49.7, lagging behind 6-year-old humans and falling well behind the average adult score of 94.1. These results show despite excelling in knowledge-heavy evaluations, current MLLMs still lack fundamental visual primitives. Progress in BabyVision represents a step toward human-level visual perception and reasoning capabilities. We also explore solving visual reasoning with generation models by proposing BabyVision-Gen and automatic evaluation toolkit. Our code and benchmark data are released at https://github.com/UniPat-AI/BabyVision for reproduction.

preprint2021arXiv

Generation and characterization of complex vector modes with digital micromirror devices

Complex vector light modes with a spatial variant polarization distribution have become topical of late, enabling the development of novel applications in numerous research fields. Key to this is the remarkable similarities they hold with quantum entangled states, which arises from the non-separability between the spatial and polarisation degrees of freedom (DoF). As such, the demand for diversification of generation methods and characterization techniques have increased dramatically. Here we put forward a comprehensive tutorial about the use of DMDs in the generation and characterization of vector modes, providing details on the implementation of techniques that fully exploits the unsurpassed advantage of Digital Micromirrors Devices (DMDs), such as their high refresh rates and polarisation independence. We start by briefly describing the operating principles of DMD and follow with a thorough explanation of some of the methods to shape arbitrary vector modes. Finally, we describe some techniques aiming at the real-time characterization of vector beams. This tutorial highlights the value of DMDs as an alternative tool for the generation and characterization of complex vector light fields, of great relevance in a wide variety of applications.

preprint2014arXiv

A chaotic image encryption scheme owning temp-value feedback

This paper presents a novel efficient chaotic image encryption scheme, in which the temp-value feedback mechanism is introduced to the permutation and diffusion procedures. Firstly, a simple trick is played to map the plain-image pixels to the initial condition of the Logistic map. Then, a pseudorandom number sequence (PRNS) is obtained from iterating the map. The permutation procedure is carried out by a permutation sequence which is generated by comparing the PRNS and its sorted version. The diffusion procedure is composed of two reversely executed rounds. During each round, the current plain-image pixel and the last cipher-image pixel are used to produce the current cipher-image pixel with the help of the Logistic map and a pseudorandom number generated by the Chen system. To enhance the efficiency, only expanded XOR operation and modulo 256 addition are employed during diffusion. Experimental results show that the new scheme owns a large key space and can resist the differential attack. It is also efficient.