Source author record

Yan Fang

Yan Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.SY Neural and Evolutionary Computing Robotics Systems and Control Artificial Intelligence Computation and Language Emerging Technologies Neurons and Cognition physics.acc-ph

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Let ViT Speak: Generative Language-Image Pre-training

In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLIP trains a ViT to predict language tokens directly from visual tokens using a standard language modeling objective, without contrastive batch construction or an additional text decoder. This design offers three key advantages: (1) \textbf{Simplicity}: a single transformer jointly models visual and textual tokens; (2) \textbf{Scalability}: it scales effectively with both data and model size; and (3) \textbf{Performance}: it achieves competitive or superior results across diverse multimodal benchmarks. Trained on 8B samples from Recap-DataComp-1B, GenLIP matches or surpasses strong baselines despite using substantially less pretraining data. After continued pretraining on multi-resolution images at native aspect ratios, GenLIP further improves on detail-sensitive tasks such as OCR and chart understanding, making it a strong foundation for vision encoders in MLLMs.

preprint2020arXiv

Bio-inspired Gait Imitation of Hexapod Robot Using Event-Based Vision Sensor and Spiking Neural Network

Learning how to walk is a sophisticated neurological task for most animals. In order to walk, the brain must synthesize multiple cortices, neural circuits, and diverse sensory inputs. Some animals, like humans, imitate surrounding individuals to speed up their learning. When humans watch their peers, visual data is processed through a visual cortex in the brain. This complex problem of imitation-based learning forms associations between visual data and muscle actuation through Central Pattern Generation (CPG). Reproducing this imitation phenomenon on low power, energy-constrained robots that are learning to walk remains challenging and unexplored. We propose a bio-inspired feed-forward approach based on neuromorphic computing and event-based vision to address the gait imitation problem. The proposed method trains a "student" hexapod to walk by watching an "expert" hexapod moving its legs. The student processes the flow of Dynamic Vision Sensor (DVS) data with a one-layer Spiking Neural Network (SNN). The SNN of the student successfully imitates the expert within a small convergence time of ten iterations and exhibits energy efficiency at the sub-microjoule level.

preprint2020arXiv

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

We present ConvLab-2, an open-source toolkit that enables researchers to build task-oriented dialogue systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. As the successor of ConvLab (Lee et al., 2019b), ConvLab-2 inherits ConvLab's framework but integrates more powerful dialogue models and supports more datasets. Besides, we have developed an analysis tool and an interactive tool to assist researchers in diagnosing dialogue systems. The analysis tool presents rich statistics and summarizes common mistakes from simulated dialogues, which facilitates error analysis and system improvement. The interactive tool provides a user interface that allows developers to diagnose an assembled dialogue system by interacting with the system and modifying the output of each system component.

preprint2020arXiv

Learning to Walk: Spike Based Reinforcement Learning for Hexapod Robot Central Pattern Generation

Learning to walk -- i.e., learning locomotion under performance and energy constraints continues to be a challenge in legged robotics. Methods such as stochastic gradient, deep reinforcement learning (RL) have been explored for bipeds, quadrupeds and hexapods. These techniques are computationally intensive and often prohibitive for edge applications. These methods rely on complex sensors and pre-processing of data, which further increases energy and latency. Recent advances in spiking neural networks (SNNs) promise a significant reduction in computing owing to the sparse firing of neuros and has been shown to integrate reinforcement learning mechanisms with biologically observed spike time dependent plasticity (STDP). However, training a legged robot to walk by learning the synchronization patterns of central pattern generators (CPG) in an SNN framework has not been shown. This can marry the efficiency of SNNs with synchronized locomotion of CPG based systems providing breakthrough end-to-end learning in mobile robotics. In this paper, we propose a reinforcement based stochastic weight update technique for training a spiking CPG. The whole system is implemented on a lightweight raspberry pi platform with integrated sensors, thus opening up exciting new possibilities.

preprint2015arXiv

A Simplified Phase Model for Oscillator Based Computing

Building oscillator based computing systems with emerging nano-device technologies has become a promising solution for unconventional computing tasks like computer vision and pattern recognition. However, simulation and analysis of these systems is both time and compute intensive due to the nonlinearity of new devices and the complex behavior of coupled oscillators. In order to speed up the simulation of coupled oscillator systems, we propose a simplified phase model to perform phase and frequency synchronization prediction based on a synthesis of earlier models. Our model can predict the frequency locking behavior with several orders of magnitude speedup compared to direct evaluation, enabling the effective and efficient simulation of the large numbers of oscillators required for practical computing systems.

preprint2014arXiv

Image Segmentation Using Frequency Locking of Coupled Oscillators

Synchronization of coupled oscillators is observed at multiple levels of neural systems, and has been shown to play an important function in visual perception. We propose a computing system based on locally coupled oscillator networks for image segmentation. The system can serve as the preprocessing front-end of an image processing pipeline where the common frequencies of clusters of oscillators reflect the segmentation results. To demonstrate the feasibility of our design, the system is simulated and tested on a human face image dataset and its performance is compared with traditional intensity threshold based algorithms. Our system shows both better performance and higher noise tolerance than traditional methods.

preprint2014arXiv

Mismatch study of C-ADS main linac

The ADS accelerator in China is a CW (Continuous-Wave) proton linac with 1.5 GeV in beam energy, 10 mA in beam current, and 15 MW in beam power. To meet the extremely low beam loss rate requirement and high reliability, it is very important to study the beam halo caused by beam mismatch, which is one major source of beam loss. To avoid the envelope instability, the phase advances per period are all smaller than 90 degree in the main linac design. In this paper, the results of the emittance growth and the envelope oscillations caused by mismatch in the main linac section are presented. To meet the emittance growth requirement, the transverse and longitudinal mismatch factors should be smaller than 0.4 and 0.3, respectively.

Yan Fang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Let ViT Speak: Generative Language-Image Pre-training

Bio-inspired Gait Imitation of Hexapod Robot Using Event-Based Vision Sensor and Spiking Neural Network

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

Learning to Walk: Spike Based Reinforcement Learning for Hexapod Robot Central Pattern Generation

A Simplified Phase Model for Oscillator Based Computing

Image Segmentation Using Frequency Locking of Coupled Oscillators

Mismatch study of C-ADS main linac