Researcher profile

Le Yang

Le Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Can Attribution Predict Risk? From Multi-View Attribution to Planning Risk Signals in End-to-End Autonomous Driving

End-to-end autonomous driving models generate future trajectories from multi-view inputs, improving system integration but introducing opaque decisions and hard-to-localize risks. Existing methods either rely on auxiliary monitoring models or generate textual explanations, but are decoupled from the planning process and fail to reveal the visual evidence underlying trajectory generation. While attribution offers a direct alternative, planning differs from image classification by taking six-view camera images as input and predicting continuous multi-step trajectories, requiring attribution to capture both critical views and regions and their influence on outputs. Moreover, whether attribution maps can support risk identification remains underexplored. To address this, we propose a hierarchical attribution framework for end-to-end planning. Specifically, using L2 consistency with the original trajectory as the objective, we design a coarse-to-fine region attribution strategy that searches candidate regions across the full six-view input and refines attribution within them. We further extract three attribution statistics as predictive signals for planning risk, including attribution entropy to measure how concentrated the planner's reliance is over the joint visual space, within-camera spatial variance to characterize how spread out the attribution is within each view, and cross-camera Gini coefficient to quantify how unevenly attribution is distributed across the six cameras. Experiments on BridgeAD, UniAD, and GenAD show that these statistics correlate with planning risk, achieving Spearman correlations of $0.30 \pm 0.07$ with trajectory error and AUROC of $0.77 \pm 0.04$ for collision detection. The signal generalizes to held-out scenes with negligible degradation and remains stable under an alternative attribution baseline.

preprint2026arXiv

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

Large Language Models (LLMs) demonstrate impressive zero-shot performance across a wide range of natural language processing tasks. Integrating various modality encoders further expands their capabilities, giving rise to Multimodal Large Language Models (MLLMs) that process not only text but also visual and auditory modality inputs. However, these advanced capabilities may also pose significant safety problems, as models can be exploited to generate harmful or inappropriate content through jailbreak attacks. While prior work has extensively explored how manipulating textual or visual modality inputs can circumvent safeguards in LLMs and MLLMs, the vulnerability of audio-specific jailbreak on Large Audio-Language Models (LALMs) remains largely underexplored. To address this gap, we introduce Jailbreak-AudioBench, which consists of the Toolbox, curated Dataset, and comprehensive Benchmark. The Toolbox supports not only text-to-audio conversion but also various editing techniques for injecting audio hidden semantics. The curated Dataset provides diverse explicit and implicit jailbreak audio examples in both original and edited forms. Utilizing this dataset, we evaluate multiple state-of-the-art LALMs and establish the most comprehensive Jailbreak benchmark to date for audio modality. Finally, Jailbreak-AudioBench establishes a foundation for advancing future research on LALMs safety alignment by enabling the in-depth exposure of more powerful jailbreak threats, such as query-based audio editing, and by facilitating the development of effective defense mechanisms.

preprint2025arXiv

Practical Traceable Over-Threshold Multi-Party Private Set Intersection

Multi-Party Private Set Intersection (MP-PSI) with threshold enhances the flexibility of MP-PSI by disclosing elements present in at least $t$ participants' sets, rather than requiring elements to appear in all $n$ sets. In scenarios where each participant is responsible for its dataset, e.g., digital forensics, MP-PSI with threshold should disclose both intersection elements and corresponding holders such that elements are traceable and the reliability of intersection is guaranteed. We refer to MP-PSI with threshold supporting traceability as Traceable Over-Threshold MP-PSI (T-OT-MP-PSI). However, research on such protocols remains limited, and existing work tolerates at most $t-2$ semi-honest participants at considerable computational cost. We propose two novel Traceable OT-MP-PSI protocols. The first, Efficient Traceable OT-MP-PSI (ET-OT-MP-PSI), combines Shamir's secret sharing with an oblivious programmable pseudorandom function, achieving significantly improved efficiency with resistance to at most $t-2$ semi-honest participants. The second, Security-enhanced Traceable OT-MP-PSI (ST-OT-MP-PSI), achieves security against up to $n-1$ semi-honest participants by further leveraging the oblivious linear evaluation protocol. Compared to Mahdavi et al.'s protocol, ours eliminate the assumption that certain special parties do not collude. Experimental results demonstrate significant improvements: for $n=5$, $t=3$, and sets of size $2^{14}$, ET-OT-MP-PSI achieves $15056\times$ speedup and ST-OT-MP-PSI achieves $505\times$ speedup over Mahdavi et al.'s protocol.

preprint2022arXiv

Analysis and Optimization of Hybrid Caching in mmWave Networks with BS Cooperation

In this paper, we investigate a hybrid caching strategy maximizing the success transmission probability (STP) in a millimeter wave (mmWave) cache-enabled network. First, we derive theoretical expressions of the STP and the average system transmission delay by utilizing stochastic geometry, then we consider the maximization of the STP and the minimization of the average system transmission delay by optimizing the design parameters. Considering the optimality structure of the NP-hard problem, the original problem is transferred into a multi-choice knapsack problem (MCKP). Finally, we investigate the impact of key network parameters on the STP and the average system transmission delay. Numerical results demonstrate the superiority of the proposed caching strategy over the conventional caching strategies in the mmWave cache-enabled networks.

preprint2022arXiv

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars

Online action detection has attracted increasing research interests in recent years. Current works model historical dependencies and anticipate the future to perceive the action evolution within a video segment and improve the detection accuracy. However, the existing paradigm ignores category-level modeling and does not pay sufficient attention to efficiency. Considering a category, its representative frames exhibit various characteristics. Thus, the category-level modeling can provide complimentary guidance to the temporal dependencies modeling. This paper develops an effective exemplar-consultation mechanism that first measures the similarity between a frame and exemplary frames, and then aggregates exemplary features based on the similarity weights. This is also an efficient mechanism, as both similarity measurement and feature aggregation require limited computations. Based on the exemplar-consultation mechanism, the long-term dependencies can be captured by regarding historical frames as exemplars, while the category-level modeling can be achieved by regarding representative frames from a category as exemplars. Due to the complementarity from the category-level modeling, our method employs a lightweight architecture but achieves new high performance on three benchmarks. In addition, using a spatio-temporal network to tackle video frames, our method makes a good trade-off between effectiveness and efficiency. Code is available at https://github.com/VividLe/Online-Action-Detection.

preprint2022arXiv

Point Cloud Quality Assessment: Dataset Construction and Learning-based No-Reference Metric

Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few researches about NR-PCQA are carried out due to the lack of a large-scale PCQA dataset. In this paper, we first build a large-scale PCQA dataset named LS-PCQA, which includes 104 reference point clouds and more than 22,000 distorted samples. In the dataset, each reference point cloud is augmented with 31 types of impairments (e.g., Gaussian noise, contrast distortion, local missing, and compression loss) at 7 distortion levels. Besides, each distorted point cloud is assigned with a pseudo quality score as its substitute of Mean Opinion Score (MOS). Inspired by the hierarchical perception system and considering the intrinsic attributes of point clouds, we propose a NR metric ResSCNN based on sparse convolutional neural network (CNN) to accurately estimate the subjective quality of point clouds. We conduct several experiments to evaluate the performance of the proposed NR metric. The results demonstrate that ResSCNN exhibits the state-of-the-art (SOTA) performance among all the existing NR-PCQA metrics and even outperforms some FR metrics. The dataset presented in this work will be made publicly accessible at http://smt.sjtu.edu.cn. The source code for the proposed ResSCNN can be found at https://github.com/lyp22/ResSCNN.

preprint2022arXiv

Spatio-Temporal Analysis of SINR Meta Distribution for mmWave Heterogeneous Networks Under Geo/G/1 Queues

A fine-grained analysis of network performance is crucial for system design. In this paper, we focus on the meta distribution of the signal-to-interference-plus-noise-ratio (SINR) in the mmWave heterogeneous networks where the base stations (BS) in each tier are modeled as a Poisson point process (PPP). By utilizing stochastic geometry and queueing theory, we characterize the spatial and temporal randomness while the special characteristics of mmWave communications, including different path loss laws for line-of-sight and non-line-of-sight links and directional beamforming, are incorporated into the analysis. We derive the moments of the conditional successful transmission probability (STP). By taking the temporal random arrival of traffic into consideration, an equation is formulated to derive the meta distribution and the meta distribution can be obtained in a recursive manner. The numerical results reveal the impact of the key network parameters, such as the SINR threshold and the blockage parameter, on the network performance.

preprint2022arXiv

Structured Attention Composition for Temporal Action Localization

Temporal action localization aims at localizing action instances from untrimmed videos. Existing works have designed various effective modules to precisely localize action instances based on appearance and motion features. However, by treating these two kinds of features with equal importance, previous works cannot take full advantage of each modality feature, making the learned model still sub-optimal. To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality. Specifically, we build a novel structured attention composition module. Unlike conventional attention, the proposed module would not infer frame attention and modality attention independently. Instead, by casting the relationship between the modality attention and the frame attention as an attention assignment process, the structured attention composition module learns to encode the frame-modality structure and uses it to regularize the inferred frame attention and modality attention, respectively, upon the optimal transport theory. The final frame-modality attention is obtained by the composition of the two individual attentions. The proposed structured attention composition module can be deployed as a plug-and-play module into existing action localization frameworks. Extensive experiments on two widely used benchmarks show that the proposed structured attention composition consistently improves four state-of-the-art temporal action localization methods and builds new state-of-the-art performance on THUMOS14. Code is availabel at https://github.com/VividLe/Structured-Attention-Composition.

preprint2021arXiv

Comprehensive study of amorphous metal oxide and Ta$_2$O$_5$-based mixed oxide coatings for gravitational-wave detectors

High finesse optical cavities of current interferometric gravitational-wave detectors are significantly limited in sensitivity by laser quantum noise and coating thermal noise. The thermal noise is associated with internal energy dissipation in the materials that compose the test masses of the interferometer. Our understanding of how the internal friction is linked to the amorphous material structure is limited due to the complexity of the problem and the lack of studies that span over a large range of materials. We present a systematic investigation of amorphous metal oxide and Ta$_2$O$_5$-based mixed oxide coatings to evaluate their suitability for low Brownian noise experiments. It is shown that the mechanical loss of metal oxides is correlated to their amorphous morphology, with continuous random network materials such as SiO$_2$ and GeO$_2$ featuring the lowest loss angles. We evaluated different Ta$_2$O$_5$-based mixed oxide thin films and studied the influence of the dopant in the optical and elastic properties of the coating. We estimated the thermal noise associated with high-reflectance multilayer stacks that employ each of the mixed oxides as the high index material. We concluded that the current high index material of TiO$_2$-doped Ta$_2$O$_5$ is the optimal choice for reduced thermal noise among Ta$_2$O$_5$-based mixed oxide coatings with low dopant concentrations.

preprint2021arXiv

On Meta Distribution and Local Delay for Cache-Enabled Networks with Random DTX: Analysis and Optimization

A fine-grained analysis of the cache-enabled networks is crucial for system design. In this paper, we focus on the meta distribution of the signal-to-interference ratio (SIR) for the cache-enabled networks where the locations of the base stations (BSs) are modeled as a Poisson point process (PPP). With the application of the random caching and the random discontinuous transmission (DTX) schemes, we derive the moments of the conditional successful transmission probability (STP), the exact meta distribution and its beta approximation by utilizing stochastic geometry. The closed-form expressions of the mean and variance of the local delay (i.e., the network jitter) are also derived. We then consider the maximization of the mean STP and the minimization of the average system transmission delay by jointly optimizing the caching probability and the BS active probability. Finally, the numerical results demonstrate the superiority of the proposed optimization schemes over the existing caching strategies and reveal the impacts of the key network parameters on the cache-enabled networks in terms of mean STP, STP variance, meta distribution, mean local delay and network jitter.

preprint2021arXiv

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint. This paper aims to address this problem by revisiting the locally supervised learning, where a network is split into gradient-isolated modules and trained with local supervision. We experimentally show that simply training local modules with E2E loss tends to collapse task-relevant information at early layers, and hence hurts the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discard task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. In fact, we show that the proposed method boils down to minimizing the combination of a reconstruction loss and a normal cross-entropy/contrastive term. Extensive empirical results on five datasets (i.e., CIFAR, SVHN, STL-10, ImageNet and Cityscapes) validate that InfoPro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training, while allowing using training data with higher-resolution or larger batch sizes under the same GPU memory constraint. Our method also enables training local modules asynchronously for potential training acceleration. Code is available at: https://github.com/blackfeather-wang/InfoPro-Pytorch.

preprint2020arXiv

A Fine-Grained Analysis of mmWave Heterogeneous Networks

A fine-grained analysis of the cache-enabled networks is crucial for system design. In this paper, we focus on the meta distribution of the signal-to-interference-plus-noise-ratio (SINR) in the mmWave heterogeneous networks where the base stations (BS) in each tier are modeled as Poisson point process (PPP). By utilizing stochastic geometry, we derive the moments of the conditional success probability, based on which the exact expression of meta distribution and its beta approximation are derived. In addition, key performance metrics, the success probability, the variance of the conditional success probability, the mean local delay and the network jitter are achieved. The distinguishing characteristics of the mmWave communications, including different path loss laws for line-of-sight and non-line-of-sight links and directional beamforming are incorporated into the analysis. The simulation results reveal the impact of the key network parameters, such as blockage parameter, bias factor, number of antenna elements and density on the performance.

preprint2020arXiv

A method for the experimental measurement of bulk and shear loss angles in amorphous thin films

Brownian thermal noise is a limiting factor for the sensitivity of many high precision metrology applications, among other gravitational-wave detectors. The origin of Brownian noise can be traced down to internal friction in the amorphous materials that are used for the high reflection coatings. To properly characterize the internal friction in an amorphous material, one needs to consider separately the bulk and shear losses. In most of previous works the two loss angles were considered equal, although without any first principle motivation. In this work we present a method that can be used to extract the material bulk and shear loss angles, based on current state-of-the-art coating ring-down measurement systems. We also show that for titania-doped tantala, a material commonly used in gravitational-wave detector coatings, the experimental data strongly favor a model with two different and distinct loss angles, over the simpler case of one single loss angle.

preprint2020arXiv

Resolution Adaptive Networks for Efficient Inference

Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks. Existing works mainly exploit architecture redundancy in network depth or width. In this paper, we focus on spatial redundancy of input samples and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs containing large objects with prototypical features, while only some "hard" samples need spatially detailed information. In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations, and those samples with high prediction confidence will exit early from the network without being further processed. Meanwhile, high-resolution paths in the network maintain the capability to recognize the "hard" samples. Therefore, RANet can effectively reduce the spatial redundancy involved in inferring high-resolution inputs. Empirically, we demonstrate the effectiveness of the proposed RANet on the CIFAR-10, CIFAR-100 and ImageNet datasets in both the anytime prediction setting and the budgeted batch classification setting.

preprint2020arXiv

Revisiting Anchor Mechanisms for Temporal Action Localization

Most of the current action localization methods follow an anchor-based pipeline: depicting action instances by pre-defined anchors, learning to select the anchors closest to the ground truth, and predicting the confidence of anchors with refinements. Pre-defined anchors set prior about the location and duration for action instances, which facilitates the localization for common action instances but limits the flexibility for tackling action instances with drastic varieties, especially for extremely short or extremely long ones. To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points. Specifically, this module represents an action instance as a point with its distances to the starting boundary and ending boundary, alleviating the pre-defined anchor restrictions in terms of action localization and duration. The proposed anchor-free module is capable of predicting the action instances whose duration is either extremely short or extremely long. By combining the proposed anchor-free module with a conventional anchor-based module, we propose a novel action localization framework, called A2Net. The cooperation between anchor-free and anchor-based modules achieves superior performance to the state-of-the-art on THUMOS14 (45.5% vs. 42.8%). Furthermore, comprehensive experiments demonstrate the complementarity between the anchor-free and the anchor-based module, making A2Net simple but effective.

preprint2020arXiv

Spatio-Temporal Analysis of Cellular Networks with Cell-Center/Edge Users

Emergence of various types of services has brought about explosive growth of traffic as well as diversified traffic characteristics in the cellular networks. To have a comprehensive understanding of the influences caused by various traffic status is vital for the deployment of the next-generation wireless networks. In this paper, we develop a mathematical analytical model by utilizing queuing theory and stochastic geometry where the randomness of the traffic and the geographical locations of the interferers can be captured. We derive the b-th moments of the conditional success probability and the closed-form expressions of the meta distribution for the cell-center users (CCUs) and the cell-edge users (CEUs), respectively. Fixed-point equations are then formulated to obtain the exact value of the meta distribution by taking the random arrival traffic into consideration and the impact of the random arrival traffic on the queue status is revealed. In addition, the mean local delays for CCUs nad CEUs are derived and the corresponding regions for CCUs and CEUs where the mean local delays maintain finite are obtained. Finally, the impact of the critical network parameters on the meta distribution and the mean local delay is investigated with the numerical results.

preprint2020arXiv

Structural evolution of binary oxide nanolaminates with annealing and its impact on room-temperature internal friction

Internal friction in oxide thin films imposes a critical limitation to the sensitivity and stability of ultra-high finesse optical cavities for gravitational wave detectors. Strategies like doping or creating nanolaminates are sought to introduce structural modifications that reduce internal friction. This work describes an investigation of the morphological changes SiO2/Ta2O5 and TiO2/Ta2O5 nanolaminates undergo with annealing and their impact on room temperature internal friction. It is demonstrated that thermal treatment results in a reduction of internal friction in both nanolaminates, but through different pathways. In the SiO2/Ta2O5 nanolaminate, which layers remain intact after annealing, the total reduction in internal friction follows the reduction in the composing SiO2 and Ta2O5 layers. Instead, interdiffusion initiated by annealing at the interface of the TiO2/Ta2O5 nanolaminate and the formation of a mixed phase dictate a more significant reduction in internal friction to ~ 2.6 * 10-4, a value lower than any other Ta2O5 mixture coating with similar cation concentration.

preprint2019arXiv

Simulation of temperature profile for the electron- and the lattice-systems in laterally structured layered conductors

Electrons in operating microelectronic semiconductor devices are accelerated by locally varying strong electric field to acquire effective electron temperatures nonuniformly distributing in nanoscales and largely exceeding the temperature of host crystal lattice. The thermal dynamics of electrons and the lattice are hence nontrivial and its understanding at nanoscales is decisively important for gaining higher device performance. Here, we propose and demonstrate that in layered conductors nonequilibrium nature between the electrons and the lattice can be explicitly pursued by simulating the conducting layer by separating it into two physical sheets representing, respectively, the electron- and the lattice-subsystems. We take, as an example of simulating GaAs devices, a 35nm thick 1um wide U-shaped conducting channel with 15nm radius of curvature at the inner corner of the U-shaped bend, and find a remarkable hot spot to develop due to hot electron generation at the inner corner. The hot spot in terms of the electron temperature achieves a significantly higher temperature and is of far sharper spatial distribution when compared to the hot spot in terms of the lattice temperature. Similar simulation calculation made on a metal (NiCr) narrow lead of the similar geometry shows that a hot spot shows up as well at the inner corner, but its strength and the spatial profiles are largely different from those in semiconductor devices. The remarkable difference between the semiconductor and the metal is interpreted to be due to the large difference in the electron specific heat, rather than the difference in the electron phonon interaction. This work will provide useful hints to deeper understanding of the nonequilibrium properties of electrical conductors, through a simple and convenient method for modeling nonequilibrium layered conductors.