Source author record

Weiwei Yang

Weiwei Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Information Theory math.IT Machine Learning Computation and Language Methodology Artificial Intelligence Computer Vision Cryptography and Security Human-Computer Interaction physics.soc-ph

Catalog footprint

What is connected

13works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agentic-imodels: Evolving agentic interpretability tools via autoresearch

Agentic data science (ADS) systems are rapidly improving their capability to autonomously analyze, fit, and interpret data, potentially moving towards a future where agents conduct the vast majority of data-science work. However, current ADS systems use statistical tools designed to be interpretable by humans, rather than interpretable by agents. To address this, we introduce Agentic-imodels, an agentic autoresearch loop that evolves data-science tools designed to be interpretable by agents. Specifically, it develops a library of scikit-learn-compatible regressors for tabular data that are optimized for both predictive performance and a novel LLM-based interpretability metric. The metric measures a suite of LLM-graded tests that probe whether a fitted model's string representation is "simulatable" by an LLM, i.e. whether the LLM can answer questions about the model's behavior by reading its string output alone. We find that the evolved models jointly improve predictive performance and agent-facing interpretability, generalizing to new datasets and new interpretability tests. Furthermore, these evolved models improve downstream end-to-end ADS, increasing performance for Copilot CLI, Claude Code, and Codex on the BLADE benchmark by up to 73%

preprint2026arXiv

MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety

We present MultiBreak, a scalable and diverse multi-turn jailbreak benchmark to evaluate large language model (LLM) safety. Multi-turn jailbreaks mimic natural conversational settings, making them easier to bypass safety-aligned LLM than single-turn jailbreaks. Existing multi-turn benchmarks are limited in size or rely heavily on templates, which restrict their diversity. To address this gap, we unify a wide range of harmful jailbreak intents, and introduce an active learning pipeline for expanding high-quality multi-turn adversarial prompts, where a generator is iteratively fine-tuned to produce stronger attack candidates, guided by uncertainty-based refinement. Our MultiBreak includes 10,389 multi-turn adversarial prompts, spans 2,665 distinct harmful intents, and covers the most diverse set of topics to date. Empirical evaluation shows that our benchmark achieves up to a 54.0 and 34.6 higher attack success rate (ASR)} than the second-best dataset on DeepSeek-R1-7B and GPT-4.1-mini, respectively. More importantly, safety evaluations suggest that diverse attack categories uncover fine-grained LLM vulnerabilities}, and categories that appear benign under single-turn can exhibit substantially higher adversarial effectiveness in multi-turn scenarios. These findings highlight persistent vulnerabilities of LLMs under realistic adversarial settings and establish MultiBreak as a scalable resource for advancing LLM safety.

preprint2022arXiv

Deep Learning with Label Noise: A Hierarchical Approach

Deep neural networks are susceptible to label noise. Existing methods to improve robustness, such as meta-learning and regularization, usually require significant change to the network architecture or careful tuning of the optimization procedure. In this work, we propose a simple hierarchical approach that incorporates a label hierarchy when training the deep learning models. Our approach requires no change of the network architecture or the optimization procedure. We investigate our hierarchical network through a wide range of simulated and real datasets and various label noise types. Our hierarchical approach improves upon regular deep neural networks in learning with label noise. Combining our hierarchical approach with pre-trained models achieves state-of-the-art performance in real-world noisy datasets.

preprint2022arXiv

Energy Efficient Design in IRS-Assisted UAV Data Collection System under Malicious Jamming

In this paper, we study an unmanned aerial vehicle (UAV) enabled data collection system, where an intelligent reflecting surface (IRS) is deployed to assist in the communication from a cluster of Internet-of-Things (IoT) devices to a UAV in the presence of a jammer. We aim to improve the energy efficiency (EE) via the joint design of UAV trajectory, IRS passive beamforming, device power allocation, and communication scheduling. However, the formulated non-linear fractional programming problem is challenging to solve due to its non-convexity and coupled variables. To overcome the difficulty, we propose an alternating optimization based algorithm to solve it sub-optimally by leveraging Dinkelbach's algorithm, successive convex approximation (SCA) technique, and block coordinate descent (BCD) method. Extensive simulation results show that the proposed design can significantly improve the anti-jamming performance. In particular, for the remote jammer case, the proposed design can largely shorten the flight path and thus decrease the energy consumption via the signal enhancement; while for the local jammer case, which is deemed highly challenging in conventional systems without IRS since the retreating away strategy becomes ineffective, our proposed design even achieves a higher performance gain owing to the efficient jamming signal mitigation.

preprint2022arXiv

Mental State Classification Using Multi-graph Features

We consider the problem of extracting features from passive, multi-channel electroencephalogram (EEG) devices for downstream inference tasks related to high-level mental states such as stress and cognitive load. Our proposed method leverages recently developed multi-graph tools and applies them to the time series of graphs implied by the statistical dependence structure (e.g., correlation) amongst the multiple sensors. We compare the effectiveness of the proposed features to traditional band power-based features in the context of three classification experiments and find that the two feature sets offer complementary predictive information. We conclude by showing that the importance of particular channels and pairs of channels for classification when using the proposed features is neuroscientifically valid.

preprint2021arXiv

Inducing a hierarchy for multi-class classification problems

In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers. The class of methods follows the structure of first clustering the conditional distributions and subsequently using a hierarchical classifier with the induced hierarchy. We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.

preprint2020arXiv

Covert Communications with Constrained Age of Information

In this letter, we consider the requirement of information freshness in covert communications for the first time. With artificial noise (AN) generated from a full-duplex (FD) receiver, we formulate a covertness maximization problem under the average age of information (AoI) constraint to optimize the transmit probability of information signal. In particular, the transmit probability not only represents the generation rate of information signal but also represents the prior probability of the alternative hypothesis in covert communications, which builds up a bridge between information freshness and communication covertness. Our analysis shows that the best transmit probability is not always 0.5, which differs from the equal prior probabilities assumption in most related works on covert communications. Furthermore, the limitation of average AoI enlarges the transmit probability at the cost of the covertness reduction and leads to a positive lower bound on the information transmit power for non-zero covertness.

preprint2020arXiv

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on victims of human trafficking are highly sensitive, yet the ability to share such data is critical to evidence-based practice and policy development across government, business, and civil society. We present new methods to anonymize, publish, and explore such data, implemented as a pipeline generating three artifacts: (1) synthetic data mitigating the privacy risk that published attribute combinations might be linked to known individuals or groups; (2) aggregate data mitigating the utility risk that synthetic data might misrepresent statistics needed for official reporting; and (3) visual analytics interfaces to both datasets mitigating the accessibility risk that privacy mechanisms or analysis tools might not be understandable and usable by all stakeholders. We present our work as a design study motivated by the goal of transforming how the world's largest database of identified victims is made available for global collaboration against human trafficking.

preprint2020arXiv

Energy-Efficient Trajectory Design for UAV-Enabled Communication Under Malicious Jamming

In this letter, we investigate a UAV-enabled communication system, where a UAV is deployed to communicate with the ground node (GN) in the presence of multiple jammers. We aim to maximize the energy efficiency (EE) of the UAV by optimizing its trajectory, subject to the UAV's mobility constraints. However, the formulated problem is difficult to solve due to the non-convex and fractional form of the objective function. Thus, we propose an iterative algorithm based on successive convex approximation (SCA) technique and Dinkelbach's algorithm to solve it. Numerical results show that the proposed algorithm can strike a better balance between the throughput and energy consumption by the optimized trajectory and thus improve the EE significantly as compared to the benchmark algorithms.

preprint2020arXiv

Robust and Secure Beamforming for Intelligent Reflecting Surface Aided mmWave MISO Systems

In this letter, we investigate the robust and secure beamforming (RSBF) in an intelligent reflecting surface (IRS) aided millimeter wave (mmWave) multiple input single output (MISO) system, where multiple single antenna eavesdroppers (Eves) are arbitrarily distributed nearby the legitimate receiver. Considering the channel state information (CSI) of Eves' channels is imperfectly known at the legitimate transmitter, the RSBF design problems to maximize the worst case of achievable secrecy rate (ASR) are formulated under the total transmission power and unit-modulus constraints. Since the problems are difficult to solve optimally due to their nonconvexity and coupled variables, we substitute the wiretap channels by a weighted combination of discrete samples and propose a RSBF scheme based on alternating optimization and semidefinite relaxation (SDR) techniques, for both colluding and noncolluding eavesdropping scenarios. Simulation results show that the proposed RSBF scheme can effectively improve the ASR and also outperforms other benchmark schemes.

preprint2020arXiv

Using Mobility for Electrical Load Forecasting During the COVID-19 Pandemic

The novel coronavirus (COVID-19) pandemic has posed unprecedented challenges for the utilities and grid operators around the world. In this work, we focus on the problem of load forecasting. With strict social distancing restrictions, power consumption profiles around the world have shifted both in magnitude and daily patterns. These changes have caused significant difficulties in short-term load forecasting. Typically algorithms use weather, timing information and previous consumption levels as input variables, yet they cannot capture large and sudden changes in socioeconomic behavior during the pandemic. In this paper, we introduce mobility as a measure of economic activities to complement existing building blocks of forecasting algorithms. Mobility data acts as good proxies for the population-level behaviors during the implementation and subsequent easing of social distancing measures. The major challenge with such dataset is that only limited mobility records are associated with the recent pandemic. To overcome this small data problem, we design a transfer learning scheme that enables knowledge transfer between several different geographical regions. This architecture leverages the diversity across these regions and the resulting aggregated model can boost the algorithm performance in each region's day-ahead forecast. Through simulations for regions in the US and Europe, we show our proposed algorithm can outperform conventional forecasting methods by more than three-folds. In addition, we demonstrate how the proposed model can be used to project how electricity consumption would recover based on different mobility scenarios.

preprint2015arXiv

Joint Relay and Jammer Selection Improves the Physical Layer Security in the Face of CSI Feedback Delays

We enhance the physical-layer security (PLS) of amplify-and-forward relaying networks with the aid of joint relay and jammer selection (JRJS), despite the deliterious effect of channel state information (CSI) feedback delays. Furthermore, we conceive a new outage-based characterization approach for the JRJS scheme. The traditional best relay selection (TBRS) is also considered as a benchmark. We first derive closed-form expressions of both the connection outage probability (COP) and of the secrecy outage probability (SOP) for both the TBRS and JRJS schemes. Then, a reliable-and-secure connection probability (RSCP) is defined and analyzed for characterizing the effect of the correlation between the COP and SOP introduced by the corporate source-relay link. The reliability-security ratio (RSR) is introduced for characterizing the relationship between the reliability and security through the asymptotic analysis. Moreover, the concept of effective secrecy throughput is defined as the product of the secrecy rate and of the RSCP for the sake of characterizing the overall efficiency of the system, as determined by the transmit SNR, secrecy codeword rate and the power sharing ratio between the relay and jammer. The impact of the direct source-eavesdropper link and additional performance comparisons with respect to other related selection schemes are further included. Our numerical results show that the JRJS scheme outperforms the TBRS method both in terms of the RSCP as well as in terms of its effective secrecy throughput, but it is more sensitive to the feedback delays. Increasing the transmit SNR will not always improve the overall throughput. Moreover, the RSR results demonstrate that upon reducing the CSI feedback delays, the reliability improves more substantially than the security degrades, implying an overall improvement in terms of the security-reliability tradeoff.

preprint2014arXiv

When Does Relay Transmission Give a More Secure Connection in Wireless Ad Hoc Networks?

Relay transmission can enhance coverage and throughput, while it can be vulnerable to eavesdropping attacks due to the additional transmission of the source message at the relay. Thus, whether or not one should use relay transmission for secure communication is an interesting and important problem. In this paper, we consider the transmission of a confidential message from a source to a destination in a decentralized wireless network in the presence of randomly distributed eavesdroppers. The source-destination pair can be potentially assisted by randomly distributed relays. For an arbitrary relay, we derive exact expressions of secure connection probability for both colluding and non-colluding eavesdroppers. We further obtain lower bound expressions on the secure connection probability, which are accurate when the eavesdropper density is small. By utilizing these lower bound expressions, we propose a relay selection strategy to improve the secure connection probability. By analytically comparing the secure connection probability for direct transmission and relay transmission, we address the important problem of whether or not to relay and discuss the condition for relay transmission in terms of the relay density and source-destination distance. These analytical results are accurate in the small eavesdropper density regime.

Weiwei Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Agentic-imodels: Evolving agentic interpretability tools via autoresearch

MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety

Deep Learning with Label Noise: A Hierarchical Approach

Energy Efficient Design in IRS-Assisted UAV Data Collection System under Malicious Jamming

Mental State Classification Using Multi-graph Features

Inducing a hierarchy for multi-class classification problems

Covert Communications with Constrained Age of Information

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Energy-Efficient Trajectory Design for UAV-Enabled Communication Under Malicious Jamming

Robust and Secure Beamforming for Intelligent Reflecting Surface Aided mmWave MISO Systems

Using Mobility for Electrical Load Forecasting During the COVID-19 Pandemic

Joint Relay and Jammer Selection Improves the Physical Layer Security in the Face of CSI Feedback Delays

When Does Relay Transmission Give a More Secure Connection in Wireless Ad Hoc Networks?