Source author record

Hong Cai

Hong Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.AP math.OC

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) deliver remarkable image and video generation quality but incur high computational cost, limiting scalability and on-device deployment. We introduce CoReDiT, a structured token pruning framework for DiTs across vision tasks. CoReDiT uses a linear-time spatial coherence score to estimate local redundancy in the latent token lattice and skips high coherence (redundant) tokens in self-attention. To maintain a dense representation and avoid visual discontinuities, we reconstruct skipped attention outputs via coherence-guided aggregation of spatially neighboring retained tokens. We further introduce a progressive, block-adaptive pruning schedule that increases pruning gradually and allocates larger budgets to blocks and denoising steps with higher redundancy. Across state-of-the-art diffusion backbones including PixArt-α and MagicDrive-V2, CoReDiT achieves up to 55% self-attention FLOPs reduction and inference speedups of 1.33x on cloud GPUs and 1.72x on mobile NPUs, while maintaining high visual quality. Notably, CoReDiT also increases on-device memory head-room, enabling higher-resolution generation.

preprint2022arXiv

Learning Implicit Feature Alignment Function for Semantic Segmentation

Integrating high-level context information with low-level details is of central importance in semantic segmentation. Towards this end, most existing segmentation models apply bilinear up-sampling and convolutions to feature maps of different scales, and then align them at the same resolution. However, bilinear up-sampling blurs the precise information learned in these feature maps and convolutions incur extra computation costs. To address these issues, we propose the Implicit Feature Alignment function (IFA). Our method is inspired by the rapidly expanding topic of implicit neural representations, where coordinate-based neural networks are used to designate fields of signals. In IFA, feature vectors are viewed as representing a 2D field of information. Given a query coordinate, nearby feature vectors with their relative coordinates are taken from the multi-level feature maps and then fed into an MLP to generate the corresponding output. As such, IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions. We demonstrate the efficacy of IFA on multiple datasets, including Cityscapes, PASCAL Context, and ADE20K. Our method can be combined with improvement on various architectures, and it achieves state-of-the-art computation-accuracy trade-off on common benchmarks. Code will be made available at https://github.com/hzhupku/IFA.

preprint2022arXiv

Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

This paper presents a novel framework to integrate both semantic and instance contexts for panoptic segmentation. In existing works, it is common to use a shared backbone to extract features for both things (countable classes such as vehicles) and stuff (uncountable classes such as roads). This, however, fails to capture the rich relations among them, which can be utilized to enhance visual understanding and segmentation performance. To address this shortcoming, we propose a novel Panoptic, Instance, and Semantic Relations (PISR) module to exploit such contexts. First, we generate panoptic encodings to summarize key features of the semantic classes and predicted instances. A Panoptic Relational Attention (PRA) module is then applied to the encodings and the global feature map from the backbone. It produces a feature map that captures 1) the relations across semantic classes and instances and 2) the relations between these panoptic categories and spatial features. PISR also automatically learns to focus on the more important instances, making it robust to the number of instances used in the relational attention module. Moreover, PISR is a general module that can be applied to any existing panoptic segmentation architecture. Through extensive evaluations on panoptic segmentation benchmarks like Cityscapes, COCO, and ADE20K, we show that PISR attains considerable improvements over existing approaches.

preprint2020arXiv

A Finsler type Lipschitz optimal transport metric for a quasilinear wave equation

We consider the global well-posedness of weak energy conservative solution to a general quasilinear wave equation through variational principle, where the solution may form finite time cusp singularity, when energy concentrates. As a main result in this paper, we construct a Finsler type optimal transport metric, then prove that the solution flow is Lipschitz under this metric. We also prove a generic regularity result by applying Thom's transversality theorem, then find piecewise smooth transportation paths among a dense set of solutions. The results in this paper are for large data solutions, without restriction on the size of solutions.

preprint2020arXiv

Singularity formation for radially symmetric expanding wave of Compressible Euler Equations

In this paper, for compressible Euler equations in multiple space dimensions, we prove the break-down of classical solutions with a large class of initial data by tracking the propagation of radially symmetric expanding wave including compression. The singularity formation is corresponding to the finite time shock formation. We also provide some new global sup-norm estimates on velocity and density functions for classical solutions. The results in this paper have no restriction on the size of solutions, hence are large data results.

preprint2016arXiv

Lipschitz metric for the Novikov equation

We consider the Lipschitz continuous dependence of solutions for the Novikov equation with respect to the initial data. In particular, we construct a Finsler type optimal transport metric which renders the solution map Lipschitz continuous on bounded set of $H^1(R)\cap W^{1,4}(R)$, although it is not Lipschitz continuous under the natural Sobolev metric from energy law due to the finite time gradient blowup. By an application of Thom's transversality Theorem, we also prove that when the initial data are in an open dense set of $H^1(R)\cap W^{1,4}(R)$, the solution is piecewise smooth. This generic regularity result helps us extend the Lipschitz continuous metric to the general weak solutions.

preprint2016arXiv

Motion and Communication Co-optimization with Path Planning and Online Channel Estimation

This paper considers the problem of optimally balancing motion energy and communication transmission energy of a mobile robot tasked with transmitting a given number of data bits to a remote station, while navigating to a prespecified destination in a given amount of time. The problem is cast in the setting of optimal control, where the robot has to choose its path, acceleration, and transmission rate along the path so as to minimize its energy required for transmission and motion, while satisfying various power and communication constraints. We use realistic models for the robot's channel estimation, motion dynamics, and power and energy costs. The main contribution of the paper is to show how to co-optimize robot's path along with other communication and motion variables. Two versions of the problem are solved: the first is defined offline by assuming that all the channel measurements are taken before the robots starts moving, while in the second the channel estimation is updated while the robot is in motion, and hence it is solved online. In both cases we utilize an in-house algorithm that computes near-optimal solutions in little time, which enables its use in the online setting. The optimization strategy is described in detail and validated by simulation of realistic scenarios.

Hong Cai

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

Learning Implicit Feature Alignment Function for Semantic Segmentation

Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

A Finsler type Lipschitz optimal transport metric for a quasilinear wave equation

Singularity formation for radially symmetric expanding wave of Compressible Euler Equations

Lipschitz metric for the Novikov equation

Motion and Communication Co-optimization with Path Planning and Online Channel Estimation