Source author record

Haoyu Li

Haoyu Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Cryptography and Security Graphics Sound Artificial Intelligence Human-Computer Interaction Machine Learning Biological Physics Computer Vision math.AP physics.optics

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Jailbreak attacks pose significant threats to large language models (LLMs), enabling attackers to bypass safeguards. However, existing reactive defense approaches struggle to keep up with the rapidly evolving multi-turn jailbreaks, where attackers continuously deepen their attacks to exploit vulnerabilities. To address this critical challenge, we propose HoneyTrap, a novel deceptive LLM defense framework leveraging collaborative defenders to counter jailbreak attacks. It integrates four defensive agents, Threat Interceptor, Misdirection Controller, Forensic Tracker, and System Harmonizer, each performing a specialized security role and collaborating to complete a deceptive defense. To ensure a comprehensive evaluation, we introduce MTJ-Pro, a challenging multi-turn progressive jailbreak dataset that combines seven advanced jailbreak strategies designed to gradually deepen attack strategies across multi-turn attacks. Besides, we present two novel metrics: Mislead Success Rate (MSR) and Attack Resource Consumption (ARC), which provide more nuanced assessments of deceptive defense beyond conventional measures. Experimental results on GPT-4, GPT-3.5-turbo, Gemini-1.5-pro, and LLaMa-3.1 demonstrate that HoneyTrap achieves an average reduction of 68.77% in attack success rates compared to state-of-the-art baselines. Notably, even in a dedicated adaptive attacker setting with intensified conditions, HoneyTrap remains resilient, leveraging deceptive engagement to prolong interactions, significantly increasing the time and computational costs required for successful exploitation. Unlike simple rejection, HoneyTrap strategically wastes attacker resources without impacting benign queries, improving MSR and ARC by 118.11% and 149.16%, respectively.

preprint2022arXiv

DDS: A new device-degraded speech dataset for speech enhancement

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance.

preprint2022arXiv

Efficient Interpolation-based Pathline Tracing with B-spline Curves in Particle Dataset

Particle tracing through numerical integration is a well-known approach to generating pathlines for visualization. However, for particle simulations, the computation of pathlines is expensive, since the interpolation method is complicated due to the lack of connectivity information. Previous studies utilize the k-d tree to reduce the time for neighborhood search. However, the efficiency is still limited by the number of tracing time steps. Therefore, we propose a novel interpolation-based particle tracing method that first represents particle data as B-spline curves and interpolates B-spline control points to reduce the number of interpolation time steps. We demonstrate our approach achieves good tracing accuracy with much less computation time.

preprint2022arXiv

Local Latent Representation based on Geometric Convolution for Particle Data Feature Exploration

Feature related particle data analysis plays an important role in many scientific applications such as fluid simulations, cosmology simulations and molecular dynamics. Compared to conventional methods that use hand-crafted feature descriptors, some recent studies focus on transforming the data into a new latent space, where features are easier to be identified, compared and extracted. However, it is challenging to transform particle data into latent representations, since the convolution neural networks used in prior studies require the data presented in regular grids. In this paper, we adopt Geometric Convolution, a neural network building block designed for 3D point clouds, to create latent representations for scientific particle data. These latent representations capture both the particle positions and their physical attributes in the local neighborhood so that features can be extracted by clustering in the latent space, and tracked by applying tracking algorithms such as mean-shift. We validate the extracted features and tracking results from our approach using datasets from three applications and show that they are comparable to the methods that define hand-crafted features for each specific dataset.

preprint2022arXiv

VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble Simulations

We propose VDL-Surrogate, a view-dependent neural-network-latent-based surrogate model for parameter space exploration of ensemble simulations that allows high-resolution visualizations and user-specified visual mappings. Surrogate-enabled parameter space exploration allows domain scientists to preview simulation results without having to run a large number of computationally costly simulations. Limited by computational resources, however, existing surrogate models may not produce previews with sufficient resolution for visualization and analysis. To improve the efficient use of computational resources and support high-resolution exploration, we perform ray casting from different viewpoints to collect samples and produce compact latent representations. This latent encoding process reduces the cost of surrogate model training while maintaining the output quality. In the model training stage, we select viewpoints to cover the whole viewing sphere and train corresponding VDL-Surrogate models for the selected viewpoints. In the model inference stage, we predict the latent representations at previously selected viewpoints and decode the latent representations to data space. For any given viewpoint, we make interpolations over decoded data at selected viewpoints and generate visualizations with user-specified visual mappings. We show the effectiveness and efficiency of VDL-Surrogate in cosmological and ocean simulations with quantitative and qualitative evaluations. Source code is publicly available at https://github.com/trainsn/VDL-Surrogate.

preprint2020arXiv

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

preprint2020arXiv

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously modeled individual types of speech features, such as only phones or only F0. This paper introduces an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with traditional phone features.The proposed framework uses two encoders such that the F0 trajectory and speech waveform are both input to the system, therefore two separate codebooks are learned. We used a WaveRNN vocoder as the decoder component of VQ-VAE. Our speaker-independent VQ-VAE was trained with raw speech waveforms from multi-speaker Japanese speech databases. Experimental results show that the proposed extension reduces F0 distortion of reconstructed speech for all unseen test speakers, and results in significantly higher preference scores from a listening test. We additionally conducted experiments using single-speaker Mandarin speech to demonstrate advantages of our architecture in another language which relies heavily on F0.

preprint2020arXiv

Multiple nodal solutions having shared componentwise nodal numbers for coupled Schrödinger equations

We investigate the structure of nodal solutions for coupled nonlinear Schrödinger equations in the repulsive coupling regime. Among other results, for the following coupled system of $N$ equations, we prove the existence of infinitely many nodal solutions which share the same componentwise-prescribed nodal numbers \begin{equation}\label{ab} \left\{ \begin{array}{lr} -Δu_{j}+λu_{j}=μu^{3}_{j}+\sum_{i\neq j}βu_{j}u_{i}^{2} \,\,\,\,\,\,\, in\ \W , u_{j}\in H_{0,r}^{1}(\W), \,\,\,\,\,\,\,\,j=1,\dots,N, \end{array} \right. \end{equation} where $\W$ is a radial domain in $\mathbb R^n$ for $n\leq 3$, $λ>0$, $μ>0$, and $β<0$. More precisely, let $p$ be a prime factor of $N$ and write $N=pB$. Suppose $β\leq-\fracμ{p-1}$. Then for any given non-negative integers $P_{1},P_{2},\dots,P_{B}$, (\ref{ab}) has infinitely many solutions $(u_{1},\dots,u_{N})$ such that each of these solutions satisfies the same property: for $b=1,...,B$, $u_{pb-p+i}$ changes sign precisely $P_b$ times for $i=1,...,p$. The result reveals the complex nature of the solution structure in the repulsive coupling regime due to componentwise segregation of solutions. Our method is to combine a heat flow approach as deformation with a minimax construction of the symmetric mountain pass theorem using a $\mathbb Z_p$ group action index. Our method is robust, also allowing to give the existence of one solution without assuming any symmetry of the coupling.

preprint2020arXiv

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

In recent years, speech enhancement (SE) has achieved impressive progress with the success of deep neural networks (DNNs). However, the DNN approach usually fails to generalize well to unseen environmental noise that is not included in the training. To address this problem, we propose "noise tokens" (NTs), which are a set of neural noise templates that are jointly trained with the SE system. NTs dynamically capture the environment variability and thus enable the DNN model to handle various environments to produce STFT magnitude with higher quality. Experimental results show that using NTs is an effective strategy that consistently improves the generalization ability of SE systems across different DNN architectures. Furthermore, we investigate applying a state-of-the-art neural vocoder to generate waveform instead of traditional inverse STFT (ISTFT). Subjective listening tests show the residual noise can be significantly suppressed through mel-spectrogram correction and vocoder-based waveform synthesis.

preprint2016arXiv

Volumetric Light-field Encryption at the Microscopic Scale

We report a light-field based method that allows the optical encryption of three-dimensional (3D) volumetric information at the microscopic scale in a single 2D light-field image. The system consists of a microlens array and an array of random phase/amplitude masks. The method utilizes a wave optics model to account for the dominant diffraction effect at this new scale, and the system point-spread function (PSF) serves as the key for encryption and decryption. We successfully developed and demonstrated a deconvolution algorithm to retrieve spatially multiplexed discrete and continuous volumetric data from 2D light-field images. Showing that the method is practical for data transmission and storage, we obtained a faithful reconstruction of the 3D volumetric information from a digital copy of the encrypted light-field image. The method represents a new level of optical encryption, paving the way for broad industrial and biomedical applications in processing and securing 3D data at the microscopic scale.

preprint2013arXiv

Tap-Wave-Rub: Lightweight Malware Prevention for Smartphones Using Intuitive Human Gestures

In this paper, we introduce a lightweight permission enforcement approach - Tap-Wave-Rub (TWR) - for smartphone malware prevention. TWR is based on simple human gestures that are very quick and intuitive but less likely to be exhibited in users' daily activities. Presence or absence of such gestures, prior to accessing an application, can effectively inform the OS whether the access request is benign or malicious. Specifically, we present the design of two mechanisms: (1) accelerometer based phone tapping detection; and (2) proximity sensor based finger tapping, rubbing or hand waving detection. The first mechanism is geared for NFC applications, which usually require the user to tap her phone with another device. The second mechanism involves very simple gestures, i.e., tapping or rubbing a finger near the top of phone's screen or waving a hand close to the phone, and broadly appeals to many applications (e.g., SMS). In addition, we present the TWR-enhanced Android permission model, the prototypes implementing the underlying gesture recognition mechanisms, and a variety of novel experiments to evaluate these mechanisms. Our results suggest the proposed approach could be very effective for malware detection and prevention, with quite low false positives and false negatives, while imposing little to no additional burden on the users.

Haoyu Li

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

DDS: A new device-degraded speech dataset for speech enhancement

Efficient Interpolation-based Pathline Tracing with B-spline Curves in Particle Dataset

Local Latent Representation based on Geometric Convolution for Particle Data Feature Exploration

VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble Simulations

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Multiple nodal solutions having shared componentwise nodal numbers for coupled Schrödinger equations

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

Volumetric Light-field Encryption at the Microscopic Scale

Tap-Wave-Rub: Lightweight Malware Prevention for Smartphones Using Intuitive Human Gestures