Modeling Dynamic Computations in the Primate Ventral Visual Stream
A major goal of computational neuroscience has been to explain how the primate ventral visual stream (VVS) transforms visual input into temporally evolving neural representations that support robust visual perception. Historically, most modeling efforts have assumed static conditions: monkeys fixate a dot, images are briefly flashed, and neural responses are analyzed through time-averaged metrics. Feedforward deep networks trained on static object recognition tasks outperform prior work in approximating these static snapshot-driven VVS responses. However, mounting neurophysiological evidence demonstrates that VVS responses are rich dynamical signals shaped not only by the retinal input but also by intrinsic circuit dynamics, recurrent interactions, and widespread top-down modulation. Moreover, real-world vision is inherently dynamic: objects move, the observer moves, and the eyes actively sample the environment. Here, we review recent progress in modeling dynamic responses in the macaque ventral stream across three domains: (1) intrinsic dynamics elicited by static images, (2) dynamics evoked by dynamic visual stimuli, and (3) dynamics generated by active sensing during eye movements. We argue that accurately modeling VVS dynamics will require representational, circuit-level, and behavioral perspectives, including multi-area recurrence, structured E/I interactions, and temporal objectives that better reflect natural behavior. We outline some key missing ingredients and propose a roadmap toward dynamic, multi-timescale models of the primate VVS.