Researcher profile

Tianyu Li

Tianyu Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

123D: Unifying Multi-Modal Autonomous Driving Data at Scale

The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets. Using 123D, we consolidate eight real-world driving datasets spanning 3,300 hours and 90,000 kilometers, together with a synthetic dataset with configurable collection scripts, and provide tools for data analysis and visualization. We conduct a systematic study comparing annotation statistics and assessing each dataset's pose and calibration accuracy. Further, we showcase two applications 123D enables: cross-dataset 3D object detection transfer and reinforcement learning for planning, and offer recommendations for future directions. Code and documentation are available at https://github.com/kesai-labs/py123d.

preprint2026arXiv

EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation

Humans exhibit adaptive, context-sensitive responses to egocentric visual input. However, faithfully modeling such reactions from egocentric video remains challenging due to the dual requirements of strictly causal generation and precise 3D spatial alignment. To tackle this problem, we first construct the Human Reaction Dataset (HRD) to address data scarcity and misalignment by building a spatially aligned egocentric video-reaction dataset, as existing datasets (e.g., ViMo) suffer from significant spatial inconsistency between the egocentric video and reaction motion, e.g., dynamically moving motions are always paired with fixed-camera videos. Leveraging HRD, we present EgoReAct, the first autoregressive framework that generates 3D-aligned human reaction motions from egocentric video streams in real-time. We first compress the reaction motion into a compact yet expressive latent space via a Vector Quantised-Variational AutoEncoder and then train a Generative Pre-trained Transformer for reaction generation from the visual input. EgoReAct incorporates 3D dynamic features, i.e., metric depth, and head dynamics during the generation, which effectively enhance spatial grounding. Extensive experiments demonstrate that EgoReAct achieves remarkably higher realism, spatial consistency, and generation efficiency compared with prior methods, while maintaining strict causality during generation. We will release code, models, and data upon acceptance.

preprint2026arXiv

UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.

preprint2024arXiv

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

A map, as crucial information for downstream applications of an autonomous driving system, is usually represented in lanelines or centerlines. However, existing literature on map learning primarily focuses on either detecting geometry-based lanelines or perceiving topology relationships of centerlines. Both of these methods ignore the intrinsic relationship of lanelines and centerlines, that lanelines bind centerlines. While simply predicting both types of lane in one model is mutually excluded in learning objective, we advocate lane segment as a new representation that seamlessly incorporates both geometry and topology information. Thus, we introduce LaneSegNet, the first end-to-end mapping network generating lane segments to obtain a complete representation of the road structure. Our algorithm features two key modifications. One is a lane attention module to capture pivotal region details within the long-range feature space. Another is an identical initialization strategy for reference points, which enhances the learning of positional priors for lane attention. On the OpenLane-V2 dataset, LaneSegNet outperforms previous counterparts by a substantial gain across three tasks, \textit{i.e.}, map element detection (+4.8 mAP), centerline perception (+6.9 DET$_l$), and the newly defined one, lane segment perception (+5.6 mAP). Furthermore, it obtains a real-time inference speed of 14.7 FPS. Code is accessible at https://github.com/OpenDriveLab/LaneSegNet.

preprint2022arXiv

Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning

In this paper, we present connections between three models used in different research fields: weighted finite automata~(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks which encompasses a set of optimization techniques for high-order tensors used in quantum physics and numerical analysis. We first present an intrinsic relation between WFA and the tensor train decomposition, a particular form of tensor network. This relation allows us to exhibit a novel low rank structure of the Hankel matrix of a function computed by a WFA and to design an efficient spectral learning algorithm leveraging this structure to scale the algorithm up to very large Hankel matrices.We then unravel a fundamental connection between WFA and second-orderrecurrent neural networks~(2-RNN): in the case of sequences of discrete symbols, WFA and 2-RNN with linear activationfunctions are expressively equivalent. Leveraging this equivalence result combined with the classical spectral learning algorithm for weighted automata, we introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous input vectors.This algorithm relies on estimating low rank sub-blocks of the Hankel tensor, from which the parameters of a linear 2-RNN can be provably recovered. The performances of the proposed learning algorithm are assessed in a simulation study on both synthetic and real-world data.

preprint2022arXiv

FastMimic: Model-based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion

Robots operating in human environments need various skills, like slow and fast walking, turning, side-stepping, and many more. However, building robot controllers that can exhibit such a large range of behaviors is a challenging problem that requires tedious investigation for every task. We present a unified model-based control algorithm for imitating different animal gaits without expensive simulation training or real-world fine-tuning. Our method consists of stance and swing leg controllers using a centroidal dynamics model augmented with online adaptation techniques. We also develop a whole-body trajectory optimization procedure to fix the kinematic infeasibility of the reference animal motions. We demonstrate that our universal data-driven model-based controller can seamlessly imitate various motor skills, including trotting, pacing, turning, and side-stepping. It also shows better tracking capabilities in simulation and the real world against several baselines, including another model-based imitation controller and a learning-based motion imitation technique.

preprint2022arXiv

foREST: A Tree-based Approach for Fuzzing RESTful APIs

Representational state transfer (REST) is a widely employed architecture by web applications and cloud. Users can invoke such services according to the specification of their application interfaces, namely RESTful APIs. Existing approaches for fuzzing RESTful APIs are generally based on classic API-dependency graphs. However, such dependencies are inefficient for REST services due to the explosion of dependencies among APIs. In this paper, we propose a novel tree-based approach that can better capture the essential dependencies and largely improve the efficiency of RESTful API fuzzing. In particular, the hierarchical information of the endpoints across multiple APIs enables us to construct an API tree, and the relationships of tree nodes can indicate the priority of resource dependencies, \textit{e.g.,} it's more likely that a node depends on its parent node rather than its offspring or siblings. In the evaluation part, we first confirm that such a tree-based approach is more efficient than traditional graph-based approaches. We then apply our tool to fuzz two real-world RESTful services and compare the performance with two state-of-the-art tools, EvoMaster and RESTler. Our results show that foREST can improve the code coverage in all experiments, ranging from 11.5\% to 82.5\%. Besides, our tool finds 11 new bugs previously unknown.

preprint2022arXiv

UserBERT: Modeling Long- and Short-Term User Preferences via Self-Supervision

E-commerce platforms generate vast amounts of customer behavior data, such as clicks and purchases, from millions of unique users every day. However, effectively using this data for behavior understanding tasks is challenging because there are usually not enough labels to learn from all users in a supervised manner. This paper extends the BERT model to e-commerce user data for pre-training representations in a self-supervised manner. By viewing user actions in sequences as analogous to words in sentences, we extend the existing BERT model to user behavior data. Further, our model adopts a unified structure to simultaneously learn from long-term and short-term user behavior, as well as user attributes. We propose methods for the tokenization of different types of user behavior sequences, the generation of input representation vectors, and a novel pretext task to enable the pre-trained model to learn from its own input, eliminating the need for labeled training data. Extensive experiments demonstrate that the learned representations result in significant improvements when transferred to three different real-world tasks, particularly compared to task-specific modeling and multi-task representation learning

preprint2021arXiv

Detecting non-Bloch topological invariants in quantum dynamics

Non-Bloch topological invariants preserve the bulk-boundary correspondence in non-Hermitian topological systems, and are a key concept in the contemporary study of non-Hermitian topology. Here we report the dynamic detection of non-Bloch topological invariants in single-photon quantum walks, revealed through the biorthogonal chiral displacement, and crosschecked with the dynamic spin textures in the generalized quasimomentum-time domain following a quantum quench. Both detection schemes are robust against symmetry-preserving disorders, and yield consistent results with theoretical predictions. Our experiments are performed far away from any boundaries, and therefore underline non-Bloch topological invariants as intrinsic properties of the system that persist in the thermodynamic limit. Our work sheds new light on the experimental investigation of non-Hermitian topology.

preprint2021arXiv

Engineering Dissipative Quasicrystals

We discuss the systematic engineering of quasicrystals in open quantum systems where quasiperiodicity is introduced through purely dissipative processes. While the resulting short-time dynamics is governed by non-Hermitian variants of the Aubry-Andre-Harper model, we demonstrate how phases and phase transitions pertaining to the non-Hermitian quasicrystals fundamentally change the long-time, steady-state-approaching dynamics under the Lindblad master equation. Our schemes are based on an exact mapping between the eigenspectrum of the Liouvillian superoperator with that of the non-Hermitian Hamiltonian, under the condition of quadratic fermionic systems subject to linear dissipation. Our work suggests a systematic route toward engineering exotic quantum dynamics in open systems, based on insights of non-Hermitian physics.

preprint2021arXiv

Two-dimensional quantum walk with non-Hermitian skin effects

We construct a two-dimensional, discrete-time quantum walk exhibiting non-Hermitian skin effects under open-boundary conditions. As a confirmation of the non-Hermitian bulk-boundary correspondence, we show that the emergence of topological edge states are consistent with Floquet winding numbers calculated using a non-Bloch band theory invoking time-dependent generalized Billouin zones. Further, the non-Bloch topological invariants associated with quasienergy bands are captured by a non-Hermitian local Chern marker in real space, defined through local biorthogonal eigen wave functions of the non-unitary Floquet operator. Our work would stimulate further studies of non-Hermitian Floquet topological phases where skin effects play a key role.

preprint2020arXiv

Learning Classifiers on Positive and Unlabeled Data with Policy Gradient

Existing algorithms aiming to learn a binary classifier from positive (P) and unlabeled (U) data generally require estimating the class prior or label noises ahead of building a classification model. However, the estimation and classifier learning are normally conducted in a pipeline instead of being jointly optimized. In this paper, we propose to alternatively train the two steps using reinforcement learning. Our proposal adopts a policy network to adaptively make assumptions on the labels of unlabeled data, while a classifier is built upon the output of the policy network and provides rewards to learn a better strategy. The dynamic and interactive training between the policy maker and the classifier can exploit the unlabeled data in a more effective manner and yield a significant improvement on the classification performance. Furthermore, we present two different approaches to represent the actions sampled from the policy. The first approach considers continuous actions as soft labels, while the other uses discrete actions as hard assignment of labels for unlabeled examples.We validate the effectiveness of the proposed method on two benchmark datasets as well as one e-commerce dataset. The result shows the proposed method is able to consistently outperform state-of-the-art methods in various settings.

preprint2020arXiv

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. We aim to reduce or even eliminate this process by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. To evaluate our work, we implemented our storage engine based on the Apache Arrow format and integrated it into the DB-X DBMS. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.