Researcher profile

Liang Feng

Liang Feng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. We introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning for complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a vision-language model pre-trained for Physical AI, with a diffusion-based trajectory decoder that generates dynamically feasible trajectories in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to enforce reasoning-action consistency and optimize reasoning quality. AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. Model weights are available at https://huggingface.co/nvidia/Alpamayo-R1-10B with inference code at https://github.com/NVlabs/alpamayo.

preprint2026arXiv

Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models

To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at https://github.com/Aredstone/L2P-Cache.

preprint2024arXiv

Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking

Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit transfer mechanisms through sole solution encodings, and employing single-objective transformation, which result in inadequate knowledge acquisition, exploitation, and transfer. To this end, this paper develops a novel EMT framework for multiobjective high-dimensional feature selection problems, namely MO-FSEMT. In particular, multiple auxiliary tasks are constructed by distinct formulation methods to provide diverse search spaces and information representations and then simultaneously addressed with the original task through a multi-slover-based multitask optimization scheme. Each task has an independent population with task-specific representations and is solved using separate evolutionary solvers with different biases and search preferences. A task-specific knowledge transfer mechanism is designed to leverage the advantage information of each task, enabling the discovery and effective transmission of high-quality solutions during the search process. Comprehensive experimental results demonstrate that our MO-FSEMT framework can achieve overall superior performance compared to the state-of-the-art FS methods on 26 datasets. Moreover, the ablation studies verify the contributions of different components of the proposed MO-FSEMT.

preprint2022arXiv

The missing link between standing- and traveling-wave resonators

Optical resonators are structures that utilize wave interference and feedback to confine light in all three dimensions. Depending on the feedback mechanism, resonators can support either standing- or traveling-wave modes. Over the years, the distinction between these two different types of modes has become so prevalent that nowadays it is one of the main characteristics for classifying optical resonators. Here, we show that an intermediate link between these two rather different groups exists. In particular, we introduce a new class of photonic resonators that supports a hybrid optical mode, i.e. at one location along the resonator the electromagnetic fields associated with the mode feature a purely standing-wave pattern, while at a different location, the fields of the same mode represent a pure traveling wave. The proposed concept is general and can be implemented using chip-scale photonics as well as free-space optics. Moreover, it can be extended to other wave phenomena such as microwaves and acoustics.

preprint2021arXiv

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.

preprint2021arXiv

Multi-Space Evolutionary Search for Large-Scale Optimization

In recent years, to improve the evolutionary algorithms used to solve optimization problems involving a large number of decision variables, many attempts have been made to simplify the problem solution space of a given problem for the evolutionary search. In the literature, the existing approaches can generally be categorized as decomposition-based methods and dimension-reduction-based methods. The former decomposes a large-scale problem into several smaller subproblems, while the latter transforms the original high-dimensional solution space into a low-dimensional space. However, it is worth noting that a given large-scale optimization problem may not always be decomposable, and it is also difficult to guarantee that the global optimum of the original problem is preserved in the reduced low-dimensional problem space. This paper thus proposes a new search paradigm, namely the multi-space evolutionary search, to enhance the existing evolutionary search methods for solving large-scale optimization problems. In contrast to existing approaches that perform an evolutionary search in a single search space, the proposed paradigm is designed to conduct a search in multiple solution spaces that are derived from the given problem, each possessing a unique landscape. The proposed paradigm makes no assumptions about the large-scale optimization problem of interest, such as that the problem is decomposable or that a certain relationship exists among the decision variables. To verify the efficacy of the proposed paradigm, comprehensive empirical studies in comparison to four state-of-the-art algorithms were conducted using the CEC2013 large-scale benchmark problems.

preprint2020arXiv

FP-Stereo: Hardware-Efficient Stereo Vision for Embedded Applications

Fast and accurate depth estimation, or stereo matching, is essential in embedded stereo vision systems, requiring substantial design effort to achieve an appropriate balance among accuracy, speed and hardware cost. To reduce the design effort and achieve the right balance, we propose FP-Stereo for building high-performance stereo matching pipelines on FPGAs automatically. FP-Stereo consists of an open-source hardware-efficient library, allowing designers to obtain the desired implementation instantly. Diverse methods are supported in our library for each stage of the stereo matching pipeline and a series of techniques are developed to exploit the parallelism and reduce the resource overhead. To improve the usability, FP-Stereo can generate synthesizable C code of the FPGA accelerator with our optimized HLS templates automatically. To guide users for the right design choice meeting specific application requirements, detailed comparisons are performed on various configurations of our library to investigate the accuracy/speed/cost trade-off. Experimental results also show that FP-Stereo outperforms the state-of-the-art FPGA design from all aspects, including 6.08% lower error, 2x faster speed, 30% less resource usage and 40% less energy consumption. Compared to GPU designs, FP-Stereo achieves the same accuracy at a competitive speed while consuming much less energy.

preprint2020arXiv

Multi-factorial Optimization for Large-scale Virtual Machine Placement in Cloud Computing

The placement scheme of virtual machines (VMs) to physical servers (PSs) is crucial to lowering operational cost for cloud providers. Evolutionary algorithms (EAs) have been performed promising-solving on virtual machine placement (VMP) problems in the past. However, as growing demand for cloud services, the existing EAs fail to implement in large-scale virtual machine placement (LVMP) problem due to the high time complexity and poor scalability. Recently, the multi-factorial optimization (MFO) technology has surfaced as a new search paradigm in evolutionary computing. It offers the ability to evolve multiple optimization tasks simultaneously during the evolutionary process. This paper aims to apply the MFO technology to the LVMP problem in heterogeneous environment. Firstly, we formulate a deployment cost based VMP problem in the form of the MFO problem. Then, a multi-factorial evolutionary algorithm (MFEA) embedded with greedy-based allocation operator is developed to address the established MFO problem. After that, a re-migration and merge operator is designed to offer the integrated solution of the LVMP problem from the solutions of MFO problem. To assess the effectiveness of our proposed method, the simulation experiments are carried on large-scale and extra large-scale VMs test data sets. The results show that compared with various heuristic methods, our method could shorten optimization time significantly and offer a competitive placement solution for the LVMP problem in heterogeneous environment.

preprint2020arXiv

Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors

Modern multicore systems are migrating from homogeneous systems to heterogeneous systems with accelerator-based computing in order to overcome the barriers of performance and power walls. In this trend, FPGA-based accelerators are becoming increasingly attractive, due to their excellent flexibility and low design cost. In this paper, we propose the architectural support for efficient interfacing between FPGA-based multi-accelerators and chip-multiprocessors (CMPs) connected through the network-on-chip (NoC). Distributed packet receivers and hierarchical packet senders are designed to maintain scalability and reduce the critical path delay under a heavy task load. A dedicated accelerator chaining mechanism is also proposed to facilitate intra-FPGA data reuse among accelerators to circumvent prohibitive communication overhead between the FPGA and processors. In order to evaluate the proposed architecture, a complete system emulation with programmability support is performed using FPGA prototyping. Experimental results demonstrate that the proposed architecture has high-performance, and is light-weight and scalable in characteristics.

preprint2020arXiv

Vehicular Fog Computing Enabled Real-time Collision Warning via Trajectory Calibration

Vehicular fog computing (VFC) has been envisioned as a promising paradigm for enabling a variety of emerging intelligent transportation systems (ITS). However, due to inevitable as well as non-negligible issues in wireless communication, including transmission latency and packet loss, it is still challenging in implementing safety-critical applications, such as real-time collision warning in vehicular networks. In this paper, we present a vehicular fog computing architecture, aiming at supporting effective and real-time collision warning by offloading computation and communication overheads to distributed fog nodes. With the system architecture, we further propose a trajectory calibration based collision warning (TCCW) algorithm along with tailored communication protocols. Specifically, an application-layer vehicular-to-infrastructure (V2I) communication delay is fitted by the Stable distribution with real-world field testing data. Then, a packet loss detection mechanism is designed. Finally, TCCW calibrates real-time vehicle trajectories based on received vehicle status including GPS coordinates, velocity, acceleration, heading direction, as well as the estimation of communication delay and the detection of packet loss. For performance evaluation, we build the simulation model and implement conventional solutions including cloud-based warning and fog-based warning without calibration for comparison. Real-vehicle trajectories are extracted as the input, and the simulation results demonstrate that the effectiveness of TCCW in terms of the highest precision and recall in a wide range of scenarios.