Source author record

Han Cai

Han Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory Machine Learning math.IT quant-ph physics.optics cond-mat.mes-hall Artificial Intelligence cond-mat.quant-gas physics.atom-ph Computation and Language cond-mat.str-el Information Retrieval

Catalog footprint

What is connected

19works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based on flow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the full ODE sampling trajectory. To this end, we shift the distillation target from endpoint consistency mapping $(z_{t}\rightarrow z_{0})$ to flow-map transition learning $(z_{t}\rightarrow z_{r})$ over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors (i.e., discretization error in few-step sampling and exposure bias in causal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.

preprint2022arXiv

A Bound on the Minimal Field Size of LRCs, and Cyclic MR Codes That Attain It

We prove a new lower bound on the field size of locally repairable codes (LRCs). Additionally, we construct maximally recoverable (MR) codes which are cyclic. While a known construction for MR codes has the same parameters, it produces non-cyclic codes. Furthermore, we prove both necessary conditions and sufficient conditions that specify when the known non-cyclic MR codes may be permuted to become cyclic, thus proving our construction produces cyclic MR codes with new parameters. Furthermore, using our new bound on the field size, we show that the new cyclic MR codes have optimal field size in certain cases. Other known LRCs are also shown to have optimal field size in certain cases.

preprint2022arXiv

A Class of Minimum Storage Cooperative Regenerating Codes with Low Access Property

In this paper, a new repair scheme for a modified construction of MDS codes is studied. The obtained repair scheme has optimal bandwidth for multiple failed nodes under the cooperative repair model. In addition, the repair scheme has relatively low access property, where the number of data accessed is less than two times the optimal value.

preprint2022arXiv

A New Cooperative Repair Scheme with k + 1 Helper Nodes for (n, k) Hadamard MSR codes with Small Sub-packetization

Cooperative repair model is an available technology to deal with multiple node failures in distributed storage systems. Recently, explicit constructions of cooperative MSR codes were given by Ye (IEEE Transactions on Information Theory, 2020) with sub-packetization level $(d-k+h)(d-k+1)^n$. Specifically, the sub-packetization level is $(h+1)2^n$ when $d=k+1$. In this paper, we propose a new cooperative repair scheme by means of the inter-instance and intra-instance pairing inherited from the perfect code which reduces the sub-packetization to $2^n$ when $(h+1)|2^n$ and $(2\ell+1)2^n$ when $h+1=(2\ell+1)2^m$ for $m\ge 0$, $\ell\ge 1$ with $d=k+1$ helper nodes. That is to say, the sub-packetization is $h + 1 $ times or $2^m$ times less than Ye's. It turned out to be the best result so far known.

preprint2022arXiv

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

preprint2022arXiv

Floquet superradiance lattices in thermal atoms

Floquet modulation has been widely used in optical lattices for coherent control of quantum gases, in particular for synthesizing artificial gauge fields and simulating topological matters. However, such modulation induces heating which can overwhelm the signal of quantum dynamics in ultracold atoms. Here we report that the thermal motion, instead of being a noise source, provides a new control knob in Floquet-modulated superradiance lattices, which are momentum-space tight-binding lattices of collectively excited states of atoms. The Doppler shifts combined with Floquet modulation provide effective forces along arbitrary directions in a lattice in frequency and momentum dimensions. Dynamic localization, dynamic delocalization and chiral edge currents can be simultaneously observed from a single transport spectrum of superradiance lattices in thermal atoms. Our work paves a way for simulating Floquet topological matters in room-temperature atoms and facilitates their applications in photonic devices.

preprint2022arXiv

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25% computation increment, 7x7 kernels achieve +14.0 mAP better than 3x3 kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to 5.0x without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge. Our code and pre-trained models are released at https://github.com/mit-han-lab/litepose.

preprint2022arXiv

Network Augmentation for Tiny Deep Learning

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (e.g., data augmentation, dropout) have shown much success on large neural networks by adding noise to overcome over-fitting. However, we found these techniques hurt the performance of tiny neural networks. We argue that training tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from under-fitting rather than over-fitting due to limited capacity. To alleviate this issue, NetAug augments the network (reverse dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in addition to functioning as an independent model. At test time, only the tiny model is used for inference, incurring zero inference overhead. We demonstrate the effectiveness of NetAug on image classification and object detection. NetAug consistently improves the performance of tiny models, achieving up to 2.2% accuracy improvement on ImageNet. On object detection, achieving the same level of performance, NetAug requires 41% fewer MACs on Pascal VOC and 38% fewer MACs on COCO than the baseline.

preprint2020arXiv

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.

preprint2020arXiv

Many-body chiral edge currents and sliding phases of atomic spinwaves in momentum-space lattice

Collective excitations (spinwaves) of long-lived atomic hyperfine states can be synthesized into a Bose-Hubbard model in momentum space. We explore many-body ground states and dynamics of a two-leg momentum-space lattice formed by two coupled hyperfine states. Essential ingredients of this setting are a staggered artificial magnetic field engineered by lasers that couple the spinwave states, and a state-dependent long-range interaction, which is induced by laser-dressing a hyperfine state to a Rydberg state. The Rydberg dressed two-body interaction gives rise to a state-dependent blockade in momentum space, and can amplify staggered flux induced anti-chiral edge currents in the many-body ground state in the presence of magnetic flux. When the Rydberg dressing is applied to both hyperfine states, exotic sliding insulating and superfluid/supersolid phases emerge. Due to the Rydberg dressed long-range interaction, spinwaves slide along a leg of the momentum-space lattice without costing energy. Our study paves a route to the quantum simulation of topological phases and exotic dynamics with interacting spinwaves of atomic hyperfine states in momentum-space lattice.

preprint2020arXiv

On Optimal Locally Repairable Codes and Generalized Sector-Disk Codes

Optimal locally repairable codes with information locality are considered. Optimal codes are constructed, whose length is also order-optimal with respect to a new bound on the code length derived in this paper. The length of the constructed codes is super-linear in the alphabet size, which improves upon the well known pyramid codes, whose length is only linear in the alphabet size. The recoverable erasure patterns are also analyzed for the new codes. Based on the recoverable erasure patterns, we construct generalized sector-disk (GSD) codes, which can recover from disk erasures mixed with sector erasures in a more general setting than known sector-disk (SD) codes. Additionally, the number of sectors in the constructed GSD codes is super-linear in the alphabet size, compared with known SD codes, whose number of sectors is only linear in the alphabet size.

preprint2020arXiv

Once-for-All: Train One Network and Specialize it for Efficient Deployment

We address the challenging problem of efficient inference across many devices and resource constraints, especially on edge devices. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing $CO_2$ emission as much as 5 cars' lifetime) thus unscalable. In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost. We can quickly get a specialized sub-network by selecting from the OFA network without additional training. To efficiently train OFA networks, we also propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning (depth, width, kernel size, and resolution). It can obtain a surprisingly large number of sub-networks ($> 10^{19}$) that can fit different hardware platforms and latency constraints while maintaining the same level of accuracy as training independently. On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top-1 accuracy under the mobile setting ($<$600M MACs). OFA is the winning solution for the 3rd Low Power Computer Vision Challenge (LPCVC), DSP classification track and the 4th LPCVC, both classification track and detection track. Code and 50 pre-trained models (for many devices & many latency constraints) are released at https://github.com/mit-han-lab/once-for-all.

preprint2020arXiv

Topological phases of quantized light

Topological photonics is an emerging research area that focuses on the topological states of classical light. Here we reveal the topological phases that are intrinsic to the particle nature of light, i.e., solely related to the quantized Fock states and the inhomogeneous coupling between them. The Hamiltonian of two cavities coupled with a two-level atom is an intrinsic one-dimensional Su-Schriefer-Heeger model of Fock states. By adding another cavity, the Fock-state lattice is extended to two dimensions with a honeycomb structure, where the strain due to the inhomogeneity of the coupling strengths induces a Lifshitz topological phase transition between a semimetal and a band insulator. In the semimetallic phase, the strain is equivalent to a pseudomagnetic field, which results in the quantization of the Landau levels and the valley Hall effect. We further construct a Haldane model where the topological phases can be characterized by the topological markers. This study demonstrates a fundamental distinction between the topological phases of bosons and fermions and provides a novel platform for studying topological physics in dimensions higher than three.

preprint2016arXiv

Mesoscopic Superposition States Generated by Synthetic Spin-orbit Interaction in Fock-state Lattices

Mesoscopic superposition states of photons can be prepared in three cavities interacting with the same two-level atom. By periodically modulating the three cavity frequencies around the transition frequency of the atom with $2π/3$ phase difference, the time reversal symmetry is broken and an optical circulator is generated with chiralities depending on the quantum state of the atom. A superposition of the atomic states can guide photons from one cavity to a mesoscopic superposition of the other two cavities. The physics can be understood in a finite spin-orbit-coupled Fock-state lattice where the atom and the cavities carry the spin and the orbit degrees of freedom, respectively. This scheme can be realized in circuit QED architectures and provides a new platform for exploring quantum information and topological physics in novel lattices.

preprint2016arXiv

Optimal Locally Repairable Systematic Codes Based on Packings

Locally repairable codes are desirable for distributed storage systems to improve the repair efficiency. In this paper, we first build a bridge between locally repairable code and packing. As an application of this bridge, some optimal locally repairable codes can be obtained by packings, which gives optimal locally repairable codes with flexible parameters.

preprint2016arXiv

Product-based Neural Networks for User Response Prediction

Predicting user responses, such as clicks and conversions, is of great importance and has found its usage in many Web applications including recommender systems, web search and online advertising. The data in those applications is mostly categorical and contains multiple fields; a typical representation is to transform it into a high-dimensional sparse binary feature representation via one-hot encoding. Facing with the extreme sparsity, traditional models may limit their capacity of mining shallow patterns from the data, i.e. low-order feature combinations. Deep models like deep neural networks, on the other hand, cannot be directly applied for the high-dimensional input because of the huge feature space. In this paper, we propose a Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between inter-field categories, and further fully connected layers to explore high-order feature interactions. Our experimental results on two large-scale real-world ad click datasets demonstrate that PNNs consistently outperform the state-of-the-art models on various metrics.

preprint2016arXiv

Quantum coherence between cavity and artificial atom in a superconducting circuit QED ladder system

We have created a quantum three-level ladder system with the cavity dispersive energy level in a superconducting circuit quantum electrodynamics system consisting of a transmon qubit and a cavity, and have directly observed the Autler-Townes splitting effect instead of representing it by the probability of the qubit being at each level. A coupler tone is applied on the transition between the second excited state of transmon and cavity dispersive level, while the cavity spectrum is probed. A doublet transmission and anormalous dispersion spectrum of the cavity level is clearly shown. The inverse Fourier transform of cavity spectrum indicates that there is a quantum coherence Rabi oscillation of the populations between cavity and qubit.

preprint2016arXiv

Symmetry protected single photon subradiance

We study the protection of subradiant states by the symmetry of the atomic distributions in the Dicke limit, in which collective Lamb shifts cannot be neglected. We find that anti-symmetric states are subradiant states for distributions with reflection symmetry. Continuous symmetry can also be used to achieve subradiance. This study is relevant to the problem of robust quantum memory with long storage time and fast readout.

preprint2015arXiv

Topological phase transitions in superradiance lattices

Topological phases of matters are of fundamental interest and have promising applications. Fascinating topological properties of light have been unveiled in classical optical materials. However, the manifestation of topological physics in quantum optics has not been discovered. Here we study the topological phases in a two-dimensional momentum-space superradiance lattice composed of timed Dicke states (TDS) in electromagnetically induced transparency (EIT). By periodically modulating the three EIT coupling fields, we can create a Haldane model with in-situ tunable topological properties, which manifest themselves in the contrast between diffraction signals emitted by superradiant TDS. The topological superradiance lattices provide a controllable platform for simulating exotic phenomena in condensed matter physics and offer a basis of topological quantum optics and novel photonic devices.

Han Cai

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

A Bound on the Minimal Field Size of LRCs, and Cyclic MR Codes That Attain It

A Class of Minimum Storage Cooperative Regenerating Codes with Low Access Property

A New Cooperative Repair Scheme with k + 1 Helper Nodes for (n, k) Hadamard MSR codes with Small Sub-packetization

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Floquet superradiance lattices in thermal atoms

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Network Augmentation for Tiny Deep Learning

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Many-body chiral edge currents and sliding phases of atomic spinwaves in momentum-space lattice

On Optimal Locally Repairable Codes and Generalized Sector-Disk Codes

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Topological phases of quantized light

Mesoscopic Superposition States Generated by Synthetic Spin-orbit Interaction in Fock-state Lattices

Optimal Locally Repairable Systematic Codes Based on Packings

Product-based Neural Networks for User Response Prediction

Quantum coherence between cavity and artificial atom in a superconducting circuit QED ladder system

Symmetry protected single photon subradiance

Topological phase transitions in superradiance lattices