Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.

preprint2026arXiv

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

preprint2022arXiv

Balanced supersaturation for some degenerate hypergraphs

A classical theorem of Simonovits from the 1980s asserts that every graph $G$ satisfying ${e(G) \gg v(G)^{1+1/k}}$ must contain $\gtrsim \left(\frac{e(G)}{v(G)}\right)^{2k}$ copies of $C_{2k}$. Recently, Morris and Saxton established a balanced version of Simonovits' theorem, showing that such $G$ has $\gtrsim \left(\frac{e(G)}{v(G)}\right)^{2k}$ copies of $C_{2k}$, which are `uniformly distributed' over the edges of $G$. Moreover, they used this result to obtain a sharp bound on the number of $C_{2k}$-free graphs via the container method. In this paper, we generalise Morris-Saxton's results for even cycles to $Θ$-graphs. We also prove analogous results for complete $r$-partite $r$-graphs.

preprint2022arXiv

Exponential decay of intersection volume with applications on list-decodability and Gilbert-Varshamov type bound

We give some natural sufficient conditions for balls in a metric space to have small intersection. Roughly speaking, this happens when the metric space is (i) expanding and (ii) well-spread, and (iii) a certain random variable on the boundary of a ball has a small tail. As applications, we show that the volume of intersection of balls in Hamming, Johnson spaces and symmetric groups decay exponentially as their centers drift apart. To verify condition (iii), we prove some large deviation inequalities `on a slice' for functions with Lipschitz conditions. We then use these estimates on intersection volumes to $\bullet$ obtain a sharp lower bound on list-decodability of random $q$-ary codes, confirming a conjecture of Li and Wootters; and $\bullet$ improve the classical bound of Levenshtein from 1971 on constant weight codes by a factor linear in dimension, resolving a problem raised by Jiang and Vardy. Our probabilistic point of view also offers a unified framework to obtain improvements on other Gilbert--Varshamov type bounds, giving conceptually simple and calculation-free proofs for $q$-ary codes, permutation codes, and spherical codes. Another consequence is a counting result on the number of codes, showing ampleness of large codes.

preprint2022arXiv

On some studies of Fraud Detection Pipeline and related issues from the scope of Ensemble Learning and Graph-based Learning

The UK anti-fraud charity Fraud Advisory Panel (FAP) in their review of 2016 estimates business costs of fraud at 144 billion, and its individual counterpart at 9.7 billion. Banking, insurance, manufacturing, and government are the most common industries affected by fraud activities. Designing an efficient fraud detection system could avoid losing the money; however, building this system is challenging due to many difficult problems, e.g.imbalanced data, computing costs, etc. Over the last three decades, there are various research relates to fraud detection but no agreement on what is the best approach to build the fraud detection system. In this thesis, we aim to answer some questions such as i) how to build a simplified and effective Fraud Detection System that not only easy to implement but also providing reliable results and our proposed Fraud Detection Pipeline is a potential backbone of the system and is easy to be extended or upgraded, ii) when to update models in our system (and keep the accuracy stable) in order to reduce the cost of updating process, iii) how to deal with an extreme imbalance in big data classification problem, e.g. fraud detection, since this is the gap between two difficult problems, iv) further, how to apply graph-based semi-supervised learning to detect fraudulent transactions.

preprint2022arXiv

Text classification problems via BERT embedding method and graph convolutional neural network

This paper presents the novel way combining the BERT embedding method and the graph convolutional neural network. This combination is employed to solve the text classification problem. Initially, we apply the BERT embedding method to the texts (in the BBC news dataset and the IMDB movie reviews dataset) in order to transform all the texts to numerical vector. Then, the graph convolutional neural network will be applied to these numerical vectors to classify these texts into their ap-propriate classes/labels. Experiments show that the performance of the graph convolutional neural network model is better than the perfor-mances of the combination of the BERT embedding method with clas-sical machine learning models.

preprint2022arXiv

Two problems in graph Ramsey theory

We study two problems in graph Ramsey theory. In the early 1970&#39;s, Erdős and O&#39;Neil considered a generalization of Ramsey numbers. Given integers $n,k,s$ and $t$ with $n \ge k \ge s,t \ge 2$, they asked for the least integer $N=f_k(n,s,t)$ such that in any red-blue coloring of the $k$-subsets of $\{1, 2,\ldots, N\}$, there is a set of size $n$ such that either each of its $s$-subsets is contained in some red $k$-subset, or each of its $t$-subsets is contained in some blue $k$-subset. Erdős and O&#39;Neil found an exact formula for $f_k(n,s,t)$ when $k\ge s+t-1$. In the arguably more interesting case where $k=s+t-2$, they showed $2^{-\binom{k}{2}}n<\log f_k(n,s,t)<2n$ for sufficiently large $n$. Our main result closes the gap between these lower and upper bounds, determining the logarithm of $f_{s+t-2}(n,s,t)$ up to a multiplicative factor. Recently, Damásdi, Keszegh, Malec, Tompkins, Wang and Zamora initiated the investigation of saturation problems in Ramsey theory, wherein one seeks to minimize $n$ such that there exists an $r$-edge-coloring of $K_n$ for which any extension of this to an $r$-edge-coloring of $K_{n+1}$ would create a new monochromatic copy of $K_k$. We obtain essentially sharp bounds for this problem.

preprint2022arXiv

Understanding Public Opinion on Using Hydroxychloroquine for COVID-19 Treatment via Social Media

Hydroxychloroquine (HCQ) is used to prevent or treat malaria caused by mosquito bites. Recently, the drug has been suggested to treat COVID-19, but that has not been supported by scientific evidence. The information regarding the drug efficacy has flooded social networks, posting potential threats to the community by perverting their perceptions of the drug efficacy. This paper studies the reactions of social network users on the recommendation of using HCQ for COVID-19 treatment by analyzing the reaction patterns and sentiment of the tweets. We collected 164,016 tweets from February to December 2020 and used a text mining approach to identify social reaction patterns and opinion change over time. Our descriptive analysis identified an irregularity of the users&#39; reaction patterns associated tightly with the social and news feeds on the development of HCQ and COVID-19 treatment. The study linked the tweets and Google search frequencies to reveal the viewpoints of local communities on the use of HCQ for COVID-19 treatment across different states. Further, our tweet sentiment analysis reveals that public opinion changed significantly over time regarding the recommendation of using HCQ for COVID-19 treatment. The data showed that high support in the early dates but it significantly declined in October. Finally, using the manual classification of 4,850 tweets by humans as our benchmark, our sentiment analysis showed that the Google Cloud Natural Language algorithm outperformed the Valence Aware Dictionary and sEntiment Reasoner in classifying tweets, especially in the sarcastic tweet group.

preprint2022arXiv

Viscous droplet impingement on soft substrates

Viscous droplets impinging on soft substrates may exhibit several distinct behaviours including repeated bouncing, wetting, and hovering, i.e., spreading and retracting after impact without bouncing back or wetting. We experimentally study the conditions enabling these characteristic behaviours by systematically varying the substrate elasticity, impact velocity and the liquid viscosity. For each substrate elasticity, the transition to wetting is determined as the dependence of the Weber number We, which measures the droplet&#39;s kinetic energy against its surface energy, on the Ohnesorge number Oh, which compares viscosity to inertia and capillarity. We find that while We at the wetting transition monotonically decreases with Oh for relatively rigid substrates, it exhibits a counter-intuitive behaviour in which it first increases then gradually decreases for softer substrates. We experimentally determine the dependence of the maximum Weber number allowing non-wetting impacts on the substrate elasticity and show that it provides an excellent quantitative measure of liquid repellency for a wide range of surfaces, from liquid to soft surfaces and non-deformable surfaces.

preprint2020arXiv

Demonstrating Immersive Media Delivery on 5G Broadcast and Multicast Testing Networks

This work presents eight demonstrators and one showcase developed within the 5G-Xcast project. They experimentally demonstrate and validate key technical enablers for the future of media delivery, associated with multicast and broadcast communication capabilities in 5th Generation (5G). In 5G-Xcast, three existing testbeds: IRT in Munich (Germany), 5GIC in Surrey (UK), and TUAS in Turku (Finland), have been developed into 5G broadcast and multicast testing networks, which enables us to demonstrate our vision of a converged 5G infrastructure with fixed and mobile accesses and terrestrial broadcast, delivering immersive audio-visual media content. Built upon the improved testing networks, the demonstrators and showcase developed in 5G-Xcast show the impact of the technology developed in the project. Our demonstrations predominantly cover use cases belonging to two verticals: Media & Entertainment and Public Warning, which are future 5G scenarios relevant to multicast and broadcast delivery. In this paper, we present the development of these demonstrators, the showcase, and the testbeds. We also provide key findings from the experiments and demonstrations, which not only validate the technical solutions developed in the project, but also illustrate the potential technical impact of these solutions for broadcasters, content providers, operators, and other industries interested in the future immersive media delivery.

preprint2020arXiv

Predicting Sample Collision with Neural Networks

Many state-of-art robotics applications require fast and efficient motion planning algorithms. Existing motion planning methods become less effective as the dimensionality of the robot and its workspace increases, especially the computational cost of collision detection routines. In this work, we present a framework to address the cost of expensive primitive operations in sampling-based motion planning. This framework determines the validity of a sample robot configuration through a novel combination of a Contractive AutoEncoder (CAE), which captures a occupancy grids representation of the robot&#39;s workspace, and a Multilayer Perceptron, which efficiently predicts the collision state of the robot from the CAE and the robot&#39;s configuration. We evaluate our framework on multiple planning problems with a variety of robots in 2D and 3D workspaces. The results show that (1) the framework is computationally efficient in all investigated problems, and (2) the framework generalizes well to new workspaces.

preprint2020arXiv

Predictive Probability Path Planning Model For Dynamic Environments

Path planning in dynamic environments is essential to high-risk applications such as unmanned aerial vehicles, self-driving cars, and autonomous underwater vehicles. In this paper, we generate collision-free trajectories for a robot within any given environment with temporal and spatial uncertainties caused due to randomly moving obstacles. We use two Poisson distributions to model the movements of obstacles across the generated trajectory of a robot in both space and time to determine the probability of collision with an obstacle. Measures are taken to avoid an obstacle by intelligently manipulating the speed of the robot at space-time intervals where a larger number of obstacles intersect the trajectory of the robot. Our method potentially reduces the use of computationally expensive collision detection libraries. Based on our experiments, there has been a significant improvement over existing methods in terms of safety, accuracy, execution time and computational cost. Our results show a high level of accuracy between the predicted and actual number of collisions with moving obstacles.

preprint2020arXiv

Vibration-induced actuation of droplets on microstructured surfaces

When a liquid droplet impacts a vibrated micro-structured surface with asymmetric topology, the liquids perform a horizontal motion during its bouncing. The moving effect is observed when the liquid is in contact with a low surface energy surface (e.g. hydrophobic) and over a wide amplitude and frequency range. We propose that the motion direction of liquid droplets is driven by a force exerted by the unbalanced vapor flow between the contact of solid and the liquid due to the asymmetric geometry. We observe the levitation and movement dynamics of the droplet impacting on a vibrated micro-structured surface to reveal the processes responsible for the transitional regime between the moving, unmoved, and broken droplet as the vibration amplitude and frequency increases. Based on the insight provided by the experiment and on the analysis of the kinetic energy of the droplet, we develop a quantitative model for the dynamic movement and its dependence on the vibration characteristics.