Source author record

Jun Sakuma

Jun Sakuma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Artificial Intelligence Computation and Language math.OC Methodology Neural and Evolutionary Computing Other Computer Science

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input text. In this work, we show that this assumption is unnecessary and limiting. We introduce MetaBackdoor, a new class of backdoor attacks that exploits positional information as the trigger, without modifying textual content. Our key insight is that Transformer-based LLMs necessarily encode token positions to process ordered sequences. As a result, length-correlated positional structure is reflected in the model's internal computation and can be used as an effective non-content trigger signal. We demonstrate that even a simple length-based positional trigger is sufficient to activate stealthy backdoors. Unlike prior attacks, MetaBackdoor operates on visibly and semantically clean inputs and enables qualitatively new capabilities. We show that a backdoored LLM can be induced to disclose sensitive internal information, including proprietary system prompts, once a length condition is satisfied. We further demonstrate a self-activation scenario, where normal multi-turn interaction can move the conversation context into the trigger region and induce malicious tool-call behavior without attacker-supplied trigger text. In addition, MetaBackdoor is orthogonal to content-based backdoors and can be composed with them to create more precise and harder-to-detect activation conditions. Our results expand the threat model of LLM backdoors by revealing positional encoding as a previously overlooked attack surface. This challenges defenses that focus on detecting suspicious text and highlights the need for new defense strategies that explicitly account for positional triggers in modern LLM architectures.

preprint2023arXiv

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

preprint2022arXiv

Black-Box Min--Max Continuous Optimization Using CMA-ES with Worst-case Ranking Approximation

In this study, we investigate the problem of min-max continuous optimization in a black-box setting $\min_{x} \max_{y}f(x,y)$. A popular approach updates $x$ and $y$ simultaneously or alternatingly. However, two major limitations have been reported in existing approaches. (I) As the influence of the interaction term between $x$ and $y$ (e.g., $x^\mathrm{T} B y$) on the Lipschitz smooth and strongly convex-concave function $f$ increases, the approaches converge to an optimal solution at a slower rate. (II) The approaches fail to converge if $f$ is not Lipschitz smooth and strongly convex-concave around the optimal solution. To address these difficulties, we propose minimizing the worst-case objective function $F(x)=\max_{y}f(x,y)$ directly using the covariance matrix adaptation evolution strategy, in which the rankings of solution candidates are approximated by our proposed worst-case ranking approximation (WRA) mechanism. Compared with existing approaches, numerical experiments show two important findings regarding our proposed method. (1) The proposed approach is efficient in terms of $f$-calls on a Lipschitz smooth and strongly convex-concave function with a large interaction term. (2) The proposed approach can converge on functions that are not Lipschitz smooth and strongly convex-concave around the optimal solution, whereas existing approaches fail.

preprint2022arXiv

Semi-Targeted Model Poisoning Attack on Federated Learning via Backward Error Analysis

Model poisoning attacks on federated learning (FL) intrude in the entire system via compromising an edge model, resulting in malfunctioning of machine learning models. Such compromised models are tampered with to perform adversary-desired behaviors. In particular, we considered a semi-targeted situation where the source class is predetermined however the target class is not. The goal is to cause the global classifier to misclassify data of the source class. Though approaches such as label flipping have been adopted to inject poisoned parameters into FL, it has been shown that their performances are usually class-sensitive varying with different target classes applied. Typically, an attack can become less effective when shifting to a different target class. To overcome this challenge, we propose the Attacking Distance-aware Attack (ADA) to enhance a poisoning attack by finding the optimized target class in the feature space. Moreover, we studied a more challenging situation where an adversary had limited prior knowledge about a client's data. To tackle this problem, ADA deduces pair-wise distances between different classes in the latent feature space from shared model parameters based on the backward error analysis. We performed extensive empirical evaluations on ADA by varying the factor of attacking frequency in three different image classification tasks. As a result, ADA succeeded in increasing the attack performance by 1.8 times in the most challenging case with an attacking frequency of 0.01.

preprint2020arXiv

Seasonal-adjustment Based Feature Selection Method for Large-scale Search Engine Logs

Search engine logs have a great potential in tracking and predicting outbreaks of infectious disease. More precisely, one can use the search volume of some search terms to predict the infection rate of an infectious disease in nearly real-time. However, conducting accurate and stable prediction of outbreaks using search engine logs is a challenging task due to the following two-way instability characteristics of the search logs. First, the search volume of a search term may change irregularly in the short-term, for example, due to environmental factors such as the amount of media or news. Second, the search volume may also change in the long-term due to the demographic change of the search engine. That is to say, if a model is trained with such search logs with ignoring such characteristic, the resulting prediction would contain serious mispredictions when these changes occur. In this work, we proposed a novel feature selection method to overcome this instability problem. In particular, we employ a seasonal-adjustment method that decomposes each time series into three components: seasonal, trend and irregular component and build prediction models for each component individually. We also carefully design a feature selection method to select proper search terms to predict each component. We conducted comprehensive experiments on ten different kinds of infectious diseases. The experimental results show that the proposed method outperforms all comparative methods in prediction accuracy for seven of ten diseases, in both now-casting and forecasting setting. Also, the proposed method is more successful in selecting search terms that are semantically related to target diseases.

preprint2020arXiv

Statistically Significant Pattern Mining with Ordinal Utility

Statistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce a preference relation into patterns and to discover the most preferred patterns under the constraint of statistical significance, which has never been considered in existing SSPM problems. We propose an iterative multiple testing procedure that can alternately reject a hypothesis and safely ignore the hypotheses that are less useful than the rejected hypothesis. One advantage of filtering out patterns with low utility is that it avoids consumption of the significance budget by rejection of useless (that is, uninteresting) patterns. This allows the significance budget to be focused on useful patterns, leading to more useful discoveries. We show that the proposed method can control the familywise error rate (FWER) under certain assumptions, that can be satisfied by a realistic problem class in SSPM.\@We also show that the proposed method always discovers a set of patterns that is at least equally or more useful than those discovered using the standard Tarone-Bonferroni method SSPM.\@Finally, we conducted several experiments with both synthetic and real-world data to evaluate the performance of our method. As a result, in the experiments with real-world datasets, the proposed method discovered a larger number of more useful patterns than the existing method for all five conducted tasks.

preprint2016arXiv

Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization

We study large-scale classification problems in changing environments where a small part of the dataset is modified, and the effect of the data modification must be quickly incorporated into the classifier. When the entire dataset is large, even if the amount of the data modification is fairly small, the computational cost of re-training the classifier would be prohibitively large. In this paper, we propose a novel method for efficiently incorporating such a data modification effect into the classifier without actually re-training it. The proposed method provides bounds on the unknown optimal classifier with the cost only proportional to the size of the data modification. We demonstrate through numerical experiments that the proposed method provides sufficiently tight bounds with negligible computational costs, especially when a small part of the dataset is modified in a large-scale classification problem.

preprint2016arXiv

Secure Approximation Guarantee for Cryptographically Private Empirical Risk Minimization

Privacy concern has been increasingly important in many machine learning (ML) problems. We study empirical risk minimization (ERM) problems under secure multi-party computation (MPC) frameworks. Main technical tools for MPC have been developed based on cryptography. One of limitations in current cryptographically private ML is that it is computationally intractable to evaluate non-linear functions such as logarithmic functions or exponential functions. Therefore, for a class of ERM problems such as logistic regression in which non-linear function evaluations are required, one can only obtain approximate solutions. In this paper, we introduce a novel cryptographically private tool called secure approximation guarantee (SAG) method. The key property of SAG method is that, given an arbitrary approximate solution, it can provide a non-probabilistic assumption-free bound on the approximation quality under cryptographically secure computation framework. We demonstrate the benefit of the SAG method by applying it to several problems including a practical privacy-preserving data analysis task on genomic and clinical information.

preprint2015arXiv

Differentially Private Analysis of Outliers

This paper investigates differentially private analysis of distance-based outliers. The problem of outlier detection is to find a small number of instances that are apparently distant from the remaining instances. On the other hand, the objective of differential privacy is to conceal presence (or absence) of any particular instance. Outlier detection and privacy protection are thus intrinsically conflicting tasks. In this paper, instead of reporting outliers detected, we present two types of differentially private queries that help to understand behavior of outliers. One is the query to count outliers, which reports the number of outliers that appear in a given subspace. Our formal analysis on the exact global sensitivity of outlier counts reveals that regular global sensitivity based method can make the outputs too noisy, particularly when the dimensionality of the given subspace is high. Noting that the counts of outliers are typically expected to be relatively small compared to the number of data, we introduce a mechanism based on the smooth upper bound of the local sensitivity. The other is the query to discovery top-$h$ subspaces containing a large number of outliers. This task can be naively achieved by issuing count queries to each subspace in turn. However, the variation of subspaces can grow exponentially in the data dimensionality. This can cause serious consumption of the privacy budget. For this task, we propose an exponential mechanism with a customized score function for subspace discovery. To the best of our knowledge, this study is the first trial to ensure differential privacy for distance-based outlier analysis. We demonstrated our methods with synthesized datasets and real datasets. The experimental results show that out method achieve better utility compared to the global sensitivity based methods.

preprint2015arXiv

Fairness-Aware Learning with Restriction of Universal Dependency using f-Divergences

Fairness-aware learning is a novel framework for classification tasks. Like regular empirical risk minimization (ERM), it aims to learn a classifier with a low error rate, and at the same time, for the predictions of the classifier to be independent of sensitive features, such as gender, religion, race, and ethnicity. Existing methods can achieve low dependencies on given samples, but this is not guaranteed on unseen samples. The existing fairness-aware learning algorithms employ different dependency measures, and each algorithm is specifically designed for a particular one. Such diversity makes it difficult to theoretically analyze and compare them. In this paper, we propose a general framework for fairness-aware learning that uses f-divergences and that covers most of the dependency measures employed in the existing methods. We introduce a way to estimate the f-divergences that allows us to give a unified analysis for the upper bound of the estimation error; this bound is tighter than that of the existing convergence rate analysis of the divergence estimation. With our divergence estimate, we propose a fairness-aware learning algorithm, and perform a theoretical analysis of its generalization error. Our analysis reveals that, under mild assumptions and even with enforcement of fairness, the generalization error of our method is $O(\sqrt{1/n})$, which is the same as that of the regular ERM. In addition, and more importantly, we show that, for any f-divergence, the upper bound of the estimation error of the divergence is $O(\sqrt{1/n})$. This indicates that our fairness-aware learning algorithm guarantees low dependencies on unseen samples for any dependency measure represented by an f-divergence.

preprint2015arXiv

Neutralized Empirical Risk Minimization with Generalization Neutrality Bound

Currently, machine learning plays an important role in the lives and individual activities of numerous people. Accordingly, it has become necessary to design machine learning algorithms to ensure that discrimination, biased views, or unfair treatment do not result from decision making or predictions made via machine learning. In this work, we introduce a novel empirical risk minimization (ERM) framework for supervised learning, neutralized ERM (NERM) that ensures that any classifiers obtained can be guaranteed to be neutral with respect to a viewpoint hypothesis. More specifically, given a viewpoint hypothesis, NERM works to find a target hypothesis that minimizes the empirical risk while simultaneously identifying a target hypothesis that is neutral to the viewpoint hypothesis. Within the NERM framework, we derive a theoretical bound on empirical and generalization neutrality risks. Furthermore, as a realization of NERM with linear classification, we derive a max-margin algorithm, neutral support vector machine (SVM). Experimental results show that our neutral SVM shows improved classification performance in real datasets without sacrificing the neutrality guarantee.

Jun Sakuma

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Black-Box Min--Max Continuous Optimization Using CMA-ES with Worst-case Ranking Approximation

Semi-Targeted Model Poisoning Attack on Federated Learning via Backward Error Analysis

Seasonal-adjustment Based Feature Selection Method for Large-scale Search Engine Logs

Statistically Significant Pattern Mining with Ordinal Utility

Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization

Secure Approximation Guarantee for Cryptographically Private Empirical Risk Minimization

Differentially Private Analysis of Outliers

Fairness-Aware Learning with Restriction of Universal Dependency using f-Divergences

Neutralized Empirical Risk Minimization with Generalization Neutrality Bound