Source author record

Shen Li

Shen Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Robotics Computer Vision math.RT math.RA physics.optics Artificial Intelligence Computation and Language cond-mat.mtrl-sci cond-mat.soft Distributed, Parallel, and Cluster Computing eess.SY nlin.PS q-fin.PM quant-ph Systems and Control

Catalog footprint

What is connected

19works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-risk situations. While vision-language models (VLMs) offer promising approaches for safe decision-making in these environments, most current approaches lack reflective and causal reasoning, thereby limiting their overall robustness. To address this, we propose a counterfactual chain-of-thought (C-CoT) framework that leverages VLMs to decompose driving decisions into five sequential stages: scene description, critical object identification, risk prediction, counterfactual risk reasoning, and final action planning. Within the counterfactual reasoning stage, we introduce a structured meta-action evaluation tree to explicitly assess the potential consequences of alternative action combinations. This self-reflective reasoning establishes causal links between action choices and safety outcomes, improving robustness in long-tail and out-of-distribution scenarios. To validate our approach, we construct the DeepAccident-CCoT dataset based on the DeepAccident benchmark and fine-tune a Qwen2.5-VL (7B) model using low-rank adaptation. Our model achieves a risk prediction recall of 81.9%, reduces the collision rate to 3.52%, and lowers L2 error to 1.98 m. Ablation studies further confirm the critical role of counterfactual reasoning and the meta-action evaluation tree in enhancing safety and interpretability.

preprint2026arXiv

SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

End-to-end autonomous driving systems excel in common scenarios but struggle with safety-critical long-tail cases. Vision-Language-Action (VLA) models are promising due to their strong reasoning capabilities. However, most VLA-based approaches rely on positive expert demonstrations, rarely exploiting negative samples, leading to insufficient understanding of risky behaviors and safety boundaries. To address this limitation, we propose SafeAlign-VLA, a unified negative-enhanced safe alignment framework that incorporates negative data into supervised learning and reinforcement learning. First, we develop a counterfactual safety pairing paradigm to generate structured safety labels and counterfactual positive trajectories from risky scenarios via counterfactual reasoning. Then, a two-stage training strategy is adopted: negative-enhanced supervised fine-tuning for failure feedback and trajectory correction, followed by anchor-based group relative policy optimization that uses positive and negative trajectories as contrastive anchors to steer sampling and penalize high-risk behaviors via group-relative advantages. Experiments on NAVSIM and DeepAccident validate the proposed framework. SafeAlign-VLA achieves 89.1 PDMS on the NAVSIM v1 testset, improving over the baseline without negative data by 1.3%. On DeepAccident, it reduces the collision rate to 3.36%, while achieving 84.2% language accuracy and 85.8% risk prediction accuracy. These results demonstrate the effectiveness of the proposed negative-enhanced safe alignment framework for safe and robust autonomous driving.

preprint2022arXiv

Neural PCA for Flow-Based Representation Learning

Of particular interest is to discover useful representations solely from observations in an unsupervised generative manner. However, the question of whether existing normalizing flows provide effective representations for downstream tasks remains mostly unanswered despite their strong ability for sample generation and density estimation. This paper investigates this problem for such a family of generative models that admits exact invertibility. We propose Neural Principal Component Analysis (Neural-PCA) that operates in full dimensionality while capturing principal components in \emph{descending} order. Without exploiting any label information, the principal components recovered store the most informative elements in their \emph{leading} dimensions and leave the negligible in the \emph{trailing} ones, allowing for clear performance improvements of $5\%$-$10\%$ in downstream tasks. Such improvements are empirically found consistent irrespective of the number of latent trailing dimensions dropped. Our work suggests that necessary inductive bias be introduced into generative modelling when representation quality is of interest.

preprint2022arXiv

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models. While most research focuses on how to accurately retain appropriate weights while maintaining the performance of the compressed model, there are challenges in the computational overhead and memory footprint of sparse training when compressing large-scale language models. To address this problem, we propose a Parameter-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training in downstream tasks. Specifically, we first combine the data-free and data-driven criteria to efficiently and accurately measure the importance of weights. Then we investigate the intrinsic redundancy of data-driven weight importance and derive two obvious characteristics i.e., low-rankness and structuredness. Based on that, two groups of small matrices are introduced to compute the data-driven importance of weights, instead of using the original large importance score matrix, which therefore makes the sparse training resource-efficient and parameter-efficient. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) on dozens of datasets demonstrate PST performs on par or better than previous sparsity methods, despite only training a small number of parameters. For instance, compared with previous sparsity methods, our PST only requires 1.5% trainable parameters to achieve comparable performance on BERT.

preprint2022arXiv

Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty

Consistent state estimation is challenging, especially under the epistemic uncertainties arising from learned (nonlinear) dynamic and observation models. In this work, we propose a set-based estimation algorithm, named Gaussian Process-Zonotopic Kalman Filter (GP-ZKF), that produces zonotopic state estimates while respecting both the epistemic uncertainties in the learned models and aleatoric uncertainties. Our method guarantees probabilistic consistency, in the sense that the true states are bounded by sets (zonotopes) across all time steps, with high probability. We formally relate GP-ZKF with the corresponding stochastic approach, GP-EKF, in the case of learned (nonlinear) models. In particular, when linearization errors and aleatoric uncertainties are omitted and epistemic uncertainties are simplified, GP-ZKF reduces to GP-EKF. We empirically demonstrate our method's efficacy in both a simulated pendulum domain and a real-world robot-assisted dressing domain, where GP-ZKF produced more consistent and less conservative set-based estimates than all baseline stochastic methods.

preprint2021arXiv

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

The size of Transformer models is growing at an unprecedented pace. It has only taken less than one year to reach trillion-level parameters after the release of GPT-3 (175B). Training such models requires both substantial engineering efforts and enormous computing resources, which are luxuries most research teams cannot afford. In this paper, we propose PipeTransformer, which leverages automated and elastic pipelining and data parallelism for efficient distributed training of Transformer models. PipeTransformer automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training, and instead allocates resources for training of the remaining active layers. More specifically, PipeTransformer dynamically excludes converged layers from the pipeline, packs active layers into fewer GPUs, and forks more replicas to increase data-parallel width. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on GLUE and SQuAD datasets. Our results show that PipeTransformer attains a 2.4 fold speedup compared to the state-of-the-art baseline. We also provide various performance analyses for a more comprehensive understanding of our algorithmic and system-wise design. We also develop open-sourced flexible APIs for PipeTransformer, which offer a clean separation among the freeze algorithm, model definitions, and training accelerations, hence allowing it to be applied to other algorithms that require similar freezing strategies.

preprint2020arXiv

Identifying through Flows for Recovering Latent Representations

Identifiability, or recovery of the true latent representations from which the observed data originates, is de facto a fundamental goal of representation learning. Yet, most deep generative models do not address the question of identifiability, and thus fail to deliver on the promise of the recovery of the true latent sources that generate the observations. Recent work proposed identifiable generative modelling using variational autoencoders (iVAE) with a theory of identifiability. Due to the intractablity of KL divergence between variational approximate posterior and the true posterior, however, iVAE has to maximize the evidence lower bound (ELBO) of the marginal likelihood, leading to suboptimal solutions in both theory and practice. In contrast, we propose an identifiable framework for estimating latent representations using a flow-based model (iFlow). Our approach directly maximizes the marginal likelihood, allowing for theoretical guarantees on identifiability, thereby dispensing with variational approximations. We derive its optimization objective in analytical form, making it possible to train iFlow in an end-to-end manner. Simulations on synthetic data validate the correctness and effectiveness of our proposed method and demonstrate its practical advantages over other existing methods.

preprint2020arXiv

Planning With Uncertain Specifications (PUnS)

Reward engineering is crucial to high performance in reinforcement learning systems. Prior research into reward design has largely focused on Markovian functions representing the reward. While there has been research into expressing non-Markov rewards as linear temporal logic (LTL) formulas, this has focused on task specifications directly defined by the user. However, in many real-world applications, task specifications are ambiguous, and can only be expressed as a belief over LTL formulas. In this paper, we introduce planning with uncertain specifications (PUnS), a novel formulation that addresses the challenge posed by non-Markovian specifications expressed as beliefs over LTL formulas. We present four criteria that capture the semantics of satisfying a belief over specifications for different applications, and analyze the qualitative implications of these criteria within a synthetic domain. We demonstrate the existence of an equivalent Markov decision process (MDP) for any instance of PUnS. Finally, we demonstrate our approach on the real-world task of setting a dinner table automatically with a robot that inferred task specifications from human demonstrations.

preprint2020arXiv

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. Data parallelism has emerged as a popular solution for distributed training thanks to its straightforward principle and broad applicability. In general, the technique of distributed data parallelism replicates the model on every computational resource to generate gradients independently and then communicates those gradients at each iteration to keep model replicas consistent. Despite the conceptual simplicity of the technique, the subtle dependencies between computation and communication make it non-trivial to optimize the distributed training efficiency. As of v1.5, PyTorch natively provides several techniques to accelerate distributed data parallel, including bucketing gradients, overlapping computation with communication, and skipping gradient synchronization. Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.

preprint2020arXiv

Quantum Inspired Word Representation and Computation

Word meaning has different aspects, while the existing word representation "compresses" these aspects into a single vector, and it needs further analysis to recover the information in different dimensions. Inspired by quantum probability, we represent words as density matrices, which are inherently capable of representing mixed states. The experiment shows that the density matrix representation can effectively capture different aspects of word meaning while maintaining comparable reliability with the vector representation. Furthermore, we propose a novel method to combine the coherent summation and incoherent summation in the computation of both vectors and density matrices. It achieves consistent improvement on word analogy task.

preprint2018arXiv

Percolation Phase Transition from Ionic Liquids to Ionic Liquid Crystals

As typical complex liquids, ionic liquids (ILs) exhibit phases beyond the description of simple liquid theories. In particular, with an intermediate cationic side-chain length, ILs can form the nanoscale segregated liquid (NSL) phase, which will eventually transform into the ionic liquid crystal (ILC) phase when the side chains are adequately long. However, the microscopic mechanism of this transformation is still unclear. In this work, by means of coarse-grained molecular dynamics simulation, we show that, with increasing cationic side-chain length, some local pieces of non-polar domains are gradually formed by side chains aligned in parallel inside the NSL phase, before an abrupt percolation phase transition happens when the system transforms into the ILC phase, manifesting that it is a critical phenomenon. Percolation phase transition is applied to ILs, providing new insights into many recent observations both in experiments and simulations.

preprint2016arXiv

A new characterization of Auslander algebras

Let $Λ$ be a finite dimensional Auslander algebra. For a $Λ$-module $M$, we prove that the projective dimension of $M$ is at most one if and only if the projective dimension of its socle soc\,$M$ is at most one. As an application, we give a new characterization of Auslander algebra $Λ$, and prove that a finite dimensional algebra $Λ$ is an Auslander algebra provided its global dimension gl.d\,$Λ\leq2$ and an injective $Λ$-module is projective if and only if the projective dimension of its socle is at most one.

preprint2016arXiv

Breather-to-soliton and rogue wave-to-soliton transitions in a resonant erbium-doped fiber system with higher-order effects

Under investigation in this paper is the higherorder nonlinear Schrodinger and Maxwell-Bloch (HNLSMB) system which describes the wave propagation in an erbium-doped nonlinear fiber with higher-order effects including the fourth-order dispersion and quintic nonKerr nonlinearity. The breather and rogue wave (RW) solutions are shown that they can be converted into various soliton solutions including the multipeak soliton, periodic wave, antidark soliton, M-shaped soliton, and W-shaped soliton. In addition, under different values of higher-order effect, the locus of the eigenvalues on the complex plane which converts breathers or RWs into solitons is calculated.

preprint2016arXiv

Optical far-field super-resolution microscopy using nitrogen vacancy center ensemble in bulk diamond

We demonstrate an optical far-field super-resolution microscopy using array of nitrogen vacancy centers in bulk diamond as near-field optical probes. The local optical field, which transmits through the nanostructures on the diamond surface, is measured by detecting the charge state conversion of nitrogen vacancy center. And the locating of nitrogen vacancy center with spatial resolution of 6.1 nm is realized with the charge state depletion nanoscopy. The nanostructures on the surface of diamond are then imaged with resolution below optical diffraction limit. The results offer an approach to built a general-purpose optical super-resolution microscopy and a convenient platform for high spatial resolution quantum sensing with nitrogen vacancy center.

preprint2016arXiv

Relative contravariantly finite subcategories and relative tilting modules

Let $A$ be a finite dimensional algebra over an algebraically closed field $k$. Let $T$ be a tilting $A$-module and $B={\rm End}_A\ T$ be the endomorphism algebra of $T$. In this paper, we consider the correspondence between the tilting $A$-modules and the tilting $B$-modules, and we prove that there is a one-one correspondence between the basic $T$-tilting $A$-modules in $T^{\perp}$ and the basic tilting $B$-modules in $^{\perp}(D_BT)$. Moreover, we show that there is a one-one correspondence between the $T$-contravariantly finite $T$-resolving subcategories of $T^{\perp}$ and the basic $T$-tilting $A$-modules contained in $T^{\perp}$. As an application, we show that there is a one-one correspondence between the basic tilting $A$-modules in $T^{\perp}$ and the basic tilting $B$-modules in $^{\perp}(D_BT)$ if $A$ is a $1$-Gorenstein algebra or a $m$-replicated algebra over a finite dimensional hereditary algebra.

preprint2015arXiv

Some applications of $τ$-tilting theory

Let $A$ be a finite dimensional algebra over an algebraically closed field $k$, and $M$ be a partial tilting $A$-module. We prove that the Bongartz $τ$-tilting complement of $M$ coincides with its Bongartz complement, and then we give a new proof of that every almost complete tilting $A$-module has at most two complements. Let $A=kQ$ be a path algebra. We prove that the support $τ$-tilting quiver $\overrightarrow{Q}({\rm s}τ$-${\rm tilt} A)$ of $A$ is connected. As an application, we investigate the conjecture of Happel and Unger in [9] which claims that each connected component of the tilting quiver $\overrightarrow{Q}({\rm tilt} A)$ contains only finitely many non-saturated vertices. We prove that this conjecture is true for $Q$ being all Dynkin and Euclidean quivers and wild quivers with two or three vertices, and we also give an example to indicates that this conjecture is not true if $Q$ is a wild quiver with four vertices.

preprint2013arXiv

Portfolio Optimization under Small Transaction Costs: a Convex Duality Approach

We consider an investor with constant absolute risk aversion who trades a risky asset with general Ito dynamics, in the presence of small proportional transaction costs. Kallsen and Muhle-Karbe (2012) formally derived the leading-order optimal trading policy and the associated welfare impact of transaction costs. In the present paper, we carry out a convex duality approach facilitated by the concept of shadow price processes in order to verify the main results of Kallsen and Muhle-Karbe under well-defined regularity conditions.

preprint2013arXiv

Protocol based on compressed sensing for high-speed authentication and cryptographic key distribution over a multiparty optical network

We present a protocol for the amplification and distribution of a one-time-pad cryptographic key over a point-to-multipoint optical network based on computational ghost imaging (GI) and compressed sensing (CS). It is shown experimentally that CS imaging can perform faster authentication and increase the key generation rate by an order of magnitude compared with the scheme using computational GI alone. The protocol is applicable for any number of legitimate user, thus, the scheme could be used in real intercity networks where high speed and high security are crucial.

preprint2007arXiv

Van der Waals density functional: Self-consistent potential and the nature of the van der Waals bond

We derive the exchange-correlation potential corresponding to the nonlocal van der Waals density functional [M. Dion, H. Rydberg, E. Schroder, D. C. Langreth, and B. I. Lundqvist, Phys. Rev. Lett. 92, 246401 (2004)]. We use this potential for a self-consistent calculation of the ground state properties of a number of van der Waals complexes as well as crystalline silicon. For the latter, where little or no van der Waals interaction is expected, we find that the results are mostly determined by semilocal exchange and correlation as in standard generalized gradient approximations (GGA), with the fully nonlocal term giving little effect. On the other hand, our results for the van der Waals complexes show that the self-consistency has little effect at equilibrium separations. This finding validates previous calculations with the same functional that treated the fully nonlocal term as a post GGA perturbation. A comparison of our results with wave-function calculations demonstrates the usefulness of our approach. The exchange-correlation potential also allows us to calculate Hellmann-Feynman forces, hence providing the means for efficient geometry relaxations as well as unleashing the potential use of other standard techniques that depend on the self-consistent charge distribution. The nature of the van der Waals bond is discussed in terms of the self-consistent bonding charge.

Shen Li

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

Neural PCA for Flow-Based Representation Learning

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

Identifying through Flows for Recovering Latent Representations

Planning With Uncertain Specifications (PUnS)

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Quantum Inspired Word Representation and Computation

Percolation Phase Transition from Ionic Liquids to Ionic Liquid Crystals

A new characterization of Auslander algebras

Breather-to-soliton and rogue wave-to-soliton transitions in a resonant erbium-doped fiber system with higher-order effects

Optical far-field super-resolution microscopy using nitrogen vacancy center ensemble in bulk diamond

Relative contravariantly finite subcategories and relative tilting modules

Some applications of $τ$-tilting theory

Portfolio Optimization under Small Transaction Costs: a Convex Duality Approach

Protocol based on compressed sensing for high-speed authentication and cryptographic key distribution over a multiparty optical network

Van der Waals density functional: Self-consistent potential and the nature of the van der Waals bond