Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as the class count K expands, and (ii) the absence of knowledge transfer at split time. Both failures share a common root: the range of Information Gain intrinsically scales with log2 K. Consequently, any Hoeffding-style confidence radius derived from it must inevitably grow with the class count, making a K-independent split criterion structurally impossible, taking away the potential benefits of applying streaming decision trees to continual learning. To fix this issue, we present MIST (McDiarmid Incremental Streaming Tree), which resolves both failures through three integrated components: (i) a tight, K-independent McDiarmid confidence radius for Gini splitting that acts as a structural regulariser; (ii) a Bayesian inheritance protocol that projects parent statistics to child nodes via truncated-Gaussian moments, with variance reduction guarantees strongest precisely when splitting is most conservative; and (iii) per-leaf KLL quantile sketches that support both continuous threshold evaluation and geometry-adaptive leaf prediction from a single data structure. On standard and stress-test tabular streams, MIST is competitive with global parametric methods on near-Gaussian benchmarks and uniquely robust on non-Gaussian geometry where SOTA benchmarks collapse.

preprint2026arXiv

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

Large language models increasingly mediate decisions that turn on moral judgement, yet a growing body of evidence shows that their implicit preferences are not culturally neutral. Existing cultural alignment methods either require per-country preference data and fine-tuning budgets or assume white-box access to model internals that commercial APIs do not expose. In this work, we focus on this realistic black-box, public-data-only regime and observe that within-country sociodemographic disagreement, not consensus, is the primary steering signal. We introduce DISCA (Disagreement-Informed Steering for Cultural Alignment), an inference-time method that instantiates each country as a panel of World-Values-Survey-grounded persona agents and converts their disagreement into a bounded, loss-averse logit correction. Across 20 countries and 7 open-weight backbones (2B--70B), DISCA reduces cultural misalignment on MultiTP by 10--24% on the six backbones >=3.8B, and 2--7% on open-ended scenarios, without changing any weights. Our results suggest that inference-time calibration is a scalable alternative to fine-tuning for serving the long tail of global moral preferences.

preprint2026arXiv

Unlocking Compositional Generalization in Continual Few-Shot Learning

Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either collapse scenes into global embeddings, or train with part-level matching objectives that tie representations too closely to seen patterns, leaving them unable to generalize to truly novel concepts. In this paper, we identify this fundamental structural conflict and pioneer a new paradigm that strictly decouples representation learning from compositional inference. Leveraging the inherent patch-level semantic geometry of self-supervised Vision Transformers (ViTs), our framework employs a dual-phase strategy. During training, slot representations are optimized entirely toward holistic class identity, preserving highly generalizable, object-level geometries. At inference, preserved slots are dynamically composed to match novel scenes. We demonstrate that this paradigm offers dual structural benefits: The frozen backbone naturally prevents representation drift, while our lightweight, holistic optimization preserves the features' capacity for novel-concept transfer. Extensive experiments validate this approach, achieving state-of-the-art unseen-concept generalization and minimal forgetting across standard continual learning benchmarks.

preprint2026arXiv

Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery

Cross-view geo-localization (CVGL), which matches an oblique drone view to a geo-referenced satellite tile, has emerged as a key alternative for autonomous drone navigation when GNSS signals are jammed, spoofed, or unavailable. Despite strong recent progress, three limitations persist: (1) global-descriptor designs compress the patch grid into a single vector without separating layout from texture across the view gap; (2) altitude-related scale variation is retained in the learned embedding rather than marginalized; and (3) multi-objective training relies on hand-tuned scalars over losses on incompatible gradient scales. We propose SkyPart, a lightweight swappable head for patch-based vision transformers (ViTs) that institutes explicit part grouping over the patch grid. SkyPart has four theory-grounded components: (i) learnable prototypes competing for patch tokens via single-pass cosine assignment; (ii) altitude-conditioned linear modulation applied only during training, making the retrieval embedding altitude-free at inference; (iii) a graph-attention readout over active prototypes; and (iv) a Kendall uncertainty-weighted multi-objective loss whose stationary points are Pareto-stationary. At 26.95M parameters and 22.14 GFLOPs, SkyPart is the smallest among top-performing methods and sets a new state of the art on SUES-200, University-1652, and DenseUAV under a single-pass, no-re-ranking, no-TTA protocol. Its advantage over the strongest baseline widens under the ten-condition WeatherPrompt corruption benchmark.

preprint2025arXiv

Symmetric Linear Bandits with Hidden Symmetry

High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{2/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$.

preprint2022arXiv

Playing Coopetitive Polymatrix Games with Small Manipulation Cost

Iterated coopetitive games capture the situation when one must efficiently balance between cooperation and competition with the other agents over time in order to win the game (e.g., to become the player with highest total utility). Achieving this balance is typically very challenging or even impossible when explicit communication is not feasible (e.g., negotiation or bargaining are not allowed). In this paper we investigate how an agent can achieve this balance to win in iterated coopetitive polymatrix games, without explicit communication. In particular, we consider a 3-player repeated game setting in which our agent is allowed to (slightly) manipulate the underlying game matrices of the other agents for which she pays a manipulation cost, while the other agents satisfy weak behavioural assumptions. We first propose a payoff matrix manipulation scheme and sequence of strategies for our agent that provably guarantees that the utility of any opponent would converge to a value we desire. We then use this scheme to design winning policies for our agent. We also prove that these winning policies can be found in polynomial running time. We then turn to demonstrate the efficiency of our framework in several concrete coopetitive polymatrix games, and prove that the manipulation costs needed to win are bounded above by small budgets. For instance, in the social distancing game, a polymatrix version of the lemonade stand coopetitive game, we showcase a policy with an infinitesimally small manipulation cost per round, along with a provable guarantee that, using this policy leads our agent to win in the long-run. Note that our findings can be trivially extended to $n$-player game settings as well (with $n > 3$).

preprint2022arXiv

Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

We study bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $Ω(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $Ω(\log T)$ amount of contamination to suffer regret $Ω(T)$. To combat such attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with $O(\log T)$ expected number of verifications. To derive a matching lower bound on the number of verifications, we prove that for any order-optimal bandit algorithm, this number of verifications $Ω(\log T)$ is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget $B$, we propose a novel algorithm, Secure-BARBAR, which provably achieves $O(\min\{C,T/\sqrt{B} \})$ regret with high probability against weak attackers where $C$ is the total amount of contamination by the attacker, which breaks the known $Ω(C)$ lower bound of the non-verified setting if $C$ is large.

preprint2022arXiv

Sequential Blocked Matching

We consider a sequential blocked matching (SBM) model where strategic agents repeatedly report ordinal preferences over a set of services to a central planner. The planner's goal is to elicit agents' true preferences and design a policy that matches services to agents in order to maximize the expected social welfare with the added constraint that each matched service can be \emph{blocked} or unavailable for a number of time periods. Naturally, SBM models the repeated allocation of reusable services to a set of agents where each allocated service becomes unavailable for a fixed duration. We first consider the offline SBM setting, where the strategic agents are aware of their true preferences. We measure the performance of any policy by \emph{distortion}, the worst-case multiplicative approximation guaranteed by any policy. For the setting with $s$ services, we establish lower bounds of $Ω(s)$ and $Ω(\sqrt{s})$ on the distortions of any deterministic and randomised mechanisms, respectively. We complement these results by providing approximately truthful, measured by \emph{incentive ratio}, deterministic and randomised policies based on random serial dictatorship which match our lower bounds. Our results show that there is a significant improvement if one considers the class of randomised policies. Finally, we consider the online SBM setting with bandit feedback where each agent is initially unaware of her true preferences, and the planner must facilitate each agent in the learning of their preferences through the matching of services over time. We design an approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret.

preprint2022arXiv

Socialbots on Fire: Modeling Adversarial Behaviors of Socialbots via Multi-Agent Hierarchical Reinforcement Learning

Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question "can adversaries, controlling socialbots, exploit AI techniques to their advantage?" To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to +18%) and sustainable stealthiness (up to +40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network's structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

preprint2022arXiv

Temporal Multiresolution Graph Neural Networks For Epidemic Prediction

In this paper, we introduce Temporal Multiresolution Graph Neural Networks (TMGNN), the first architecture that both learns to construct the multiscale and multiresolution graph structures and incorporates the time-series signals to capture the temporal changes of the dynamic graphs. We have applied our proposed model to the task of predicting future spreading of epidemic and pandemic based on the historical time-series data collected from the actual COVID-19 pandemic and chickenpox epidemic in several European countries, and have obtained competitive results in comparison to other previous state-of-the-art temporal architectures and graph learning algorithms. We have shown that capturing the multiscale and multiresolution structures of graphs is important to extract either local or global information that play a critical role in understanding the dynamic of a global pandemic such as COVID-19 which started from a local city and spread to the whole world. Our work brings a promising research direction in forecasting and mitigating future epidemics and pandemics.

preprint2022arXiv

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of \emph{reward} and \emph{action}. We discover that the effect of attacks crucially depend on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with $\tildeΘ(\sqrt{T})$ total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using $\tilde{O}(\sqrt{T})$ amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more works on the design of robust RL algorithms.

preprint2021arXiv

Fuzzy c-Means Clustering for Persistence Diagrams

Persistence diagrams concisely represent the topology of a point cloud whilst having strong theoretical guarantees, but the question of how to best integrate this information into machine learning workflows remains open. In this paper we extend the ubiquitous Fuzzy c-Means (FCM) clustering algorithm to the space of persistence diagrams, enabling unsupervised learning that automatically captures the topological structure of data without the topological prior knowledge or additional processing of persistence diagrams that many other techniques require. We give theoretical convergence guarantees that correspond to the Euclidean case, and empirically demonstrate the capability of our algorithm to capture topological information via the fuzzy RAND index. We end with experiments on two datasets that utilise both the topological and fuzzy nature of our algorithm: pre-trained model selection in machine learning and lattices structures from materials science. As pre-trained models can perform well on multiple tasks, selecting the best model is a naturally fuzzy problem; we show that fuzzy clustering persistence diagrams allows for model selection using the topology of decision boundaries. In materials science, we classify transformed lattice structure datasets for the first time, whilst the probabilistic membership values let us rank candidate lattices in a scenario where further investigation requires expensive laboratory time and expertise.

preprint2021arXiv

Sequential Choice Bandits with Feedback for Personalizing users' experience

In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of $\tilde{O}(N^{2/3})$ and $\tildeΩ(N^{2/3})$, respectively, where $N$ is the total number of users. Finally, we show that the waiting time of any user before receiving a personalized experience is uniform in $N$.

preprint2020arXiv

AED: An Anytime Evolutionary DCOP Algorithm

Evolutionary optimization is a generic population-based metaheuristic that can be adapted to solve a wide variety of optimization problems and has proven very effective for combinatorial optimization problems. However, the potential of this metaheuristic has not been utilized in Distributed Constraint Optimization Problems (DCOPs), a well-known class of combinatorial optimization problems prevalent in Multi-Agent Systems. In this paper, we present a novel population-based algorithm, Anytime Evolutionary DCOP (AED), that uses evolutionary optimization to solve DCOPs. In AED, the agents cooperatively construct an initial set of random solutions and gradually improve them through a new mechanism that considers an optimistic approximation of local benefits. Moreover, we present a new anytime update mechanism for AED that identifies the best among a distributed set of candidate solutions and notifies all the agents when a new best is found. In our theoretical analysis, we prove that AED is anytime. Finally, we present empirical results indicating AED outperforms the state-of-the-art DCOP algorithms in terms of solution quality.

preprint2020arXiv

Learning Optimal Temperature Region for Solving Mixed Integer Functional DCOPs

Distributed Constraint Optimization Problems (DCOPs) are an important framework for modeling coordinated decision-making problems in multi-agent systems with a set of discrete variables. Later works have extended DCOPs to model problems with a set of continuous variables, named Functional DCOPs (F-DCOPs). In this paper, we combine both of these frameworks into the Mixed Integer Functional DCOP (MIF-DCOP) framework that can deal with problems regardless of their variables' type. We then propose a novel algorithm $-$ Distributed Parallel Simulated Annealing (DPSA), where agents cooperatively learn the optimal parameter configuration for the algorithm while also solving the given problem using the learned knowledge. Finally, we empirically evaluate our approach in DCOP, F-DCOP, and MIF-DCOP settings and show that DPSA produces solutions of significantly better quality than the state-of-the-art non-exact algorithms in their corresponding settings.

preprint2020arXiv

Zealotry and Influence Maximization in the Voter Model: When to Target Zealots?

In this paper, we study influence maximization in the voter model in the presence of biased voters (or zealots) on complex networks. Under what conditions should an external controller with finite budget who aims at maximizing its influence over the system target zealots? Our analysis, based on both analytical and numerical results, shows a rich diagram of preferences and degree-dependencies of allocations to zealots and normal agents varying with the budget. We find that when we have a large budget or for low levels of zealotry, optimal strategies should give larger allocations to zealots and allocations are positively correlated with node degree. In contrast, for low budgets or highly-biased zealots, optimal strategies give higher allocations to normal agents, with some residual allocations to zealots, and allocations to both types of agents decrease with node degree. Our results emphasize that heterogeneity in agent properties strongly affects strategies for influence maximization on heterogeneous networks.

preprint2012arXiv

Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits

In budget-limited multi-armed bandit (MAB) problems, the learner's actions are costly and constrained by a fixed budget. Consequently, an optimal exploitation policy may not be to pull the optimal arm repeatedly, as is the case in other variants of MAB, but rather to pull the sequence of different arms that maximises the agent's total reward within the budget. This difference from existing MABs means that new approaches to maximising the total reward are required. Given this, we develop two pulling policies, namely: (i) KUBE; and (ii) fractional KUBE. Whereas the former provides better performance up to 40% in our experimental settings, the latter is computationally less expensive. We also prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal (i.e. they only differ from the best possible regret by a constant factor).