Source author record

Qiben Yan

Qiben Yan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security eess.AS Sound Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?

The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Through extensive experimentation, we evaluate six prominent attack techniques across a collection of voice control interfaces and devices. Contrary to prevailing narratives, our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats. Particularly, our research highlights the ineffectiveness of white-box attacks in black-box scenarios. Furthermore, the adversaries encounter substantial obstacles in obtaining precise gradient estimations during query-based interactions with commercial systems, such as Apple Siri and Samsung Bixby. Meanwhile, we find that current defense strategies are not completely immune to advanced attacks. Our findings contribute valuable insights for enhancing defense mechanisms in VCS. Through this survey, we aim to raise awareness within the academic community about the security concerns of VCS and advocate for continued research in this crucial area.

preprint2022arXiv

GhostTalk: Interactive Attack on Smartphone Voice System Through Power Line

Inaudible voice command injection is one of the most threatening attacks towards voice assistants. Existing attacks aim at injecting the attack signals over the air, but they require the access to the authorized user's voice for activating the voice assistants. Moreover, the effectiveness of the attacks can be greatly deteriorated in a noisy environment. In this paper, we explore a new type of channel, the power line side-channel, to launch the inaudible voice command injection. By injecting the audio signals over the power line through a modified charging cable, the attack becomes more resilient against various environmental factors and liveness detection models. Meanwhile, the smartphone audio output can be eavesdropped through the modified cable, enabling a highly-interactive attack. To exploit the power line side-channel, we present GhostTalk, a new hidden voice attack that is capable of injecting and eavesdropping simultaneously. Via a quick modification of the power bank cables, the attackers could launch interactive attacks by remotely making a phone call or capturing private information from the voice assistants. GhostTalk overcomes the challenge of bypassing the speaker verification system by stealthily triggering a switch component to simulate the press button on the headphone. In case when the smartphones are charged by an unaltered standard cable, we discover that it is possible to recover the audio signal from smartphone loudspeakers by monitoring the charging current on the power line. To demonstrate the feasibility, we design GhostTalk-SC, an adaptive eavesdropper system targeting smartphones charged in the public USB ports. To correctly recognize the private information in the audio, GhostTalk-SC carefully extracts audio spectra and integrates a neural network model to classify spoken digits in the speech.

preprint2022arXiv

Mining Function Homology of Bot Loaders from Honeypot Logs

Self-contained loaders are widely adopted in botnets for injecting loading commands and spawning new bots. While researchers can dissect bot clients to get various information of botnets, the cloud-based and self-contained design of loaders effectively hinders researchers from understanding the loaders' evolution and variation using classic methods. The decoupled nature of bot loaders also dramatically reduces the feasibility of investigating relationships among clients and infrastructures. In this paper, we propose a text-based method to investigate and analyze details of bot loaders using honeypots. We leverage high interaction honeypots to collect request logs and define eight families of bot loaders based on the result of agglomerative clustering. At the function level, we push our study further to explore their homological relationship based on similarity analysis of request logs using sequence aligning techniques. This further exploration discloses that the released code of Mirai keeps spawning new generations of botnets both on the client and the server side. This paper uncovers the homology of active botnet infrastructures, providing a new prospect on finding covert relationships among cybercrimes. Bot loaders are precisely investigated at the function level to yield a new insight for researchers to identify the botnet's infrastructures and track their evolution over time.

preprint2022arXiv

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a Deep Neural Network (DNN) model to extract high-level speaker-specific but utterance-independent vocal features from his/her reference audios. When the microphone is recording, the DNN generates a shadow sound to cancel the target voice in real-time. Moreover, we modulate the audible shadow sound onto an ultrasound frequency, making it inaudible for humans. By leveraging the non-linearity of the microphone circuit, the microphone can accurately decode the shadow sound for target voice cancellation. We implement and evaluate NEC comprehensively with 8 smartphone microphones in different settings. The results show that NEC effectively mutes the target speaker at a microphone without interfering with other users' normal conversations.

preprint2022arXiv

SoundFence: Securing Ultrasonic Sensors in Vehicles Using Physical-Layer Defense

Autonomous vehicles (AVs), equipped with numerous sensors such as camera, LiDAR, radar, and ultrasonic sensor, are revolutionizing the transportation industry. These sensors are expected to sense reliable information from a physical environment, facilitating the critical decision-making process of the AVs. Ultrasonic sensors, which detect obstacles in a short distance, play an important role in assisted parking and blind spot detection events. However, due to their weak security level, ultrasonic sensors are particularly vulnerable to signal injection attacks, when the attackers inject malicious acoustic signals to create fake obstacles and intentionally mislead the vehicles to make wrong decisions with disastrous aftermath. In this paper, we systematically analyze the attack model of signal injection attacks toward moving vehicles. By considering the potential threats, we propose SoundFence, a physical-layer defense system which leverages the sensors' signal processing capability without requiring any additional equipment. SoundFence verifies the benign measurement results and detects signal injection attacks by analyzing sensor readings and the physical-layer signatures of ultrasonic signals. Our experiment with commercial sensors shows that SoundFence detects most (more than 95%) of the abnormal sensor readings with very few false alarms, and it can also accurately distinguish the real echo from injected signals to identify injection attacks.

preprint2022arXiv

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SUPERVOICE that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SUPERVOICE achieves 0.58% equal error rate in the speaker verification task, it only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SUPERVOICE achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers.

Qiben Yan

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?

GhostTalk: Interactive Attack on Smartphone Voice System Through Power Line

Mining Function Homology of Bot Loaders from Honeypot Logs

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

SoundFence: Securing Ultrasonic Sensors in Vehicles Using Physical-Layer Defense

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech