Source author record

Xinhao Li

Xinhao Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mtrl-sci Machine Learning Computer Vision cond-mat.mes-hall physics.app-ph quant-ph

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

End-to-End Test-Time Training for Long Context

We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on the given context, compressing the context it reads into its weights. In addition, we improve the model's initialization for learning at test time via meta-learning at training time. Overall, our method, a form of Test-Time Training (TTT), is End-to-End (E2E) both at test time (via next-token prediction) and training time (via meta-learning), in contrast to previous forms. We conduct extensive experiments with a focus on scaling properties. In particular, for 3B models trained with 164B tokens, our method (TTT-E2E) scales with context length in the same way as Transformer with full attention, while others, such as Mamba 2 and Gated DeltaNet, do not. However, similar to RNNs, TTT-E2E has constant inference latency regardless of context length, making it 2.7 times faster than full attention for 128K context. Our code is publicly available.

preprint2024arXiv

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.

preprint2024arXiv

Learning to (Learn at Test Time)

We reformulate the problem of supervised learning as learning to learn with two nested loops (i.e. learning problems). The inner loop learns on each individual instance with self-supervision before final prediction. The outer loop learns the self-supervised task used by the inner loop, such that its final prediction improves. Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator. For practical comparison with linear or self-attention layers, we replace each of them in a transformer with an inner loop, so our outer loop is equivalent to training the architecture. When each inner-loop learner is a neural network, our approach vastly outperforms transformers with linear attention on ImageNet from 224 x 224 raw pixels in both accuracy and FLOPs, while (regular) transformers cannot run.

preprint2022arXiv

Single electrons on solid neon as a solid-state qubit platform

Progress toward the realization of quantum computers requires persistent advances in their constituent building blocks - qubits. Novel qubit platforms that simultaneously embody long coherence, fast operation, and large scalability offer compelling advantages in the construction of quantum computers and many other quantum information systems. Electrons, ubiquitous elementary particles of nonzero charge, spin, and mass, have commonly been perceived as paradigmatic local quantum information carriers. Despite superior controllability and configurability, their practical performance as qubits via either motional or spin states depends critically on their material environment. Here we report our experimental realization of a new qubit platform based upon isolated single electrons trapped on an ultraclean solid neon surface in vacuum. By integrating an electron trap in a circuit quantum electrodynamics architecture, we achieve strong coupling between the motional states of a single electron and a single microwave photon in an on-chip superconducting resonator. Qubit gate operations and dispersive readout are implemented to measure the energy relaxation time $T_1$ of $15~μ$s and phase coherence time $T_2$ over $200~$ns. These results indicate that the electron-on-solid-neon qubit already performs near the state of the art as a charge qubit.

preprint2022arXiv

Understanding the onset of surface degradation in LiNiO2 cathodes

Nickel-based layered oxides offer an attractive platform for the development of energy-dense cobalt-free cathodes for lithium-ion batteries but suffer from degradation via oxygen gas release during electrochemical cycling. While such degradation has previously been characterized phenomenologically with experiments, an atomic-scale understanding of the reactions that take place at the cathode surface has been lacking. Here, we develop a first-principles methodology for the prediction of the surface reconstructions of intercalation electrode particles as a function of the temperature and state of charge. We report the surface phase diagrams of the LiNiO2(001) and (104) surfaces and identify surface structures that are likely visited during the first charge and discharge. Our calculations indicate that both surfaces experience oxygen loss during the first charge, resulting in irreversible changes to the surface structures. At the end of charge, the surface Ni atoms migrate into tetrahedral sites, from which they further migrate into Li vacancies during discharge, leading to Li/Ni mixed discharged surface phases. Further, the impact of the temperature and voltage range during cycling on the charge/discharge mechanism is discussed. The present study thus provides insight into the initial stages of cathode surface degradation and lies the foundation for the computational design of cathode materials that are stable against oxygen release.

Xinhao Li

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

End-to-End Test-Time Training for Long Context

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Learning to (Learn at Test Time)

Single electrons on solid neon as a solid-state qubit platform

Understanding the onset of surface degradation in LiNiO2 cathodes