Source author record

Luca Arnaboldi

Luca Arnaboldi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Artificial Intelligence q-fin.TR Computation and Language cond-mat.dis-nn eess.SY Logic in Computer Science Programming Languages Software Engineering Systems and Control

Catalog footprint

What is connected

9works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no formal guarantees. Providing formal guarantees for such models is hard because "harmful behavior" has no natural specification in a discrete input space: and the standard epsilon-ball properties used in other domains do not carry semantic meaning. We close this gap by shifting verification from the discrete input space to the classifier's pre-activation space, where we define a harmful region as a convex shape enclosing the representations of known harmful prompts. Because the sigmoid classification head is monotonic, certifying the worst-case point is sufficient to certify the entire region, yielding a closed-form soundness proof without approximation in O(d) time. To formally evaluate these classifiers, we propose two constructions of such regions: SVD-aligned hyper-rectangles, which yield exact SAT/UNSAT certificates, and Gaussian Mixture Models, which yield probabilistic certificates over semantically coherent clusters. Applying this framework to three author-trained Guardrail Classifiers on the toxicity domain, every hyper-rectangle configuration returns SAT, exposing verifiable safety holes across all classifiers, despite seemingly high empirical metrics. Probabilistic GMM certificates also expose a divergent structural stability in how these models represent harm. While GPT-2 and Llama-3.1-8B maintain robust coverage of 90% and 80% across varying boundaries, BERT's safety guarantees prove uniquely volatile. This 'coverage collapse' to 55% at the optimal threshold reveals a sparsely populated safety margin in BERT, which only achieves full coverage by adopting an extremely conservative pessimistic threshold. These approaches combined, provide new insights on how effective Guardrail Classifiers really are, beyond traditional red-teaming.

preprint2026arXiv

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

Understanding how deep neural networks learn useful internal representations from data remains a central open problem in the theory of deep learning. We introduce Neural Low-Degree Filtering (Neural LoFi), a stylized limit of gradient-based training in which hierarchical feature learning becomes an explicit iterative spectral procedure. In this limit, the dynamics at each layer decouple: given the current representation, the next layer selects directions with maximal accessible low-degree correlation to the label. This yields a tractable surrogate mechanism for deep learning, together with a natural kernel-space interpretation. Neural LoFi provides a mathematically explicit framework for studying multi-layer feature learning beyond the lazy regime. It predicts how representations are selected layer by layer, explains how emergence of concepts arises with given sample complexity,and gives a concrete mechanism by which depth progressively constructs new features from old ones through low-degree compositionality. We complement the theory with mechanistic experiments on fully connected and convolutional architectures, showing that Neural LoFi improves over lazy random-feature baselines, recovers meaningful structured filters, and predicts representations aligned with early gradient-descent feature discovery with real datasets.

preprint2022arXiv

A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Although machine learning approaches have been widely used in the field of finance, to very successful degrees, these approaches remain bespoke to specific investigations and opaque in terms of explainability, comparability, and reproducibility. The primary objective of this research was to shed light upon this field by providing a generic methodology that was investigation-agnostic and interpretable to a financial markets practitioner, thus enhancing their efficiency, reducing barriers to entry, and increasing the reproducibility of experiments. The proposed methodology is showcased on two automated trading platform components. Namely, price levels, a well-known trading pattern, and a novel 2-step feature extraction method. The methodology relies on hypothesis testing, which is widely applied in other social and scientific disciplines to effectively evaluate the concrete results beyond simple classification accuracy. The main hypothesis was formulated to evaluate whether the selected trading pattern is suitable for use in the machine learning setting. Across the experiments we found that the use of the considered trading pattern in the machine learning setting is only partially supported by statistics, resulting in insignificant effect sizes (Rebound 7 - $0.64 \pm 1.02$, Rebound 11 $0.38 \pm 0.98$, and rebound 15 - $1.05 \pm 1.16$), but allowed the rejection of the null hypothesis. We showcased the generic methodology on a US futures market instrument and provided evidence that with this methodology we could easily obtain informative metrics beyond the more traditional performance and profitability metrics. This work is one of the first in applying this rigorous statistically-backed approach to the field of financial markets and we hope this may be a springboard for more research.

preprint2022arXiv

Automating Cryptographic Protocol Language Generation from Structured Specifications

Security of cryptographic protocols can be analysed by creating a model in a formal language and verifying the model in a tool. All such tools focus on the last part of the analysis, verification, and the interpretation of the specification is only explained in papers. Rather, we focus on the interpretation and modelling part by presenting a tool to aid the cryptographer throughout the process and automatically generating code in a target language. We adopt a data-centric approach where the protocol design is stored in a structured way rather than as textual specifications. Previous work shows how this approach facilitates the interpretation to a single language (for Tamarin) which required aftermath modifications. By improving the expressiveness of the specification data structure we extend the tool to export to an additional formal language, ProVerif, as well as a C++ fully running implementation. Furthermore, we extend the plugins to verify correctness in ProVerif and executability lemmas in Tamarin. In this paper we model the Diffie-Hellman key exchange, which is traditionally used as a case study; a demo is also provided for other commonly studied protocols, Needham- Schroeder and Needham-Schroeder-Lowe.

preprint2022arXiv

Towards Interdependent Safety Security Assessments using Bowties

We present a way to combine security and safety assessments using Bowtie Diagrams. Bowties model both the causes leading up to a central failure event and consequences which arise from that event, as well as barriers which impede events. Bowties have previously been used separately for security and safety assessments, but we suggest that a unified treatment in a single model can elegantly capture safety-security interdependencies of several kinds. We showcase our approach with the example of the October 2021 Facebook DNS shutdown, examining the chains of events and the interplay between the security and safety barriers which caused the outage.

preprint2022arXiv

Vehicle: Interfacing Neural Network Verifiers with Interactive Theorem Provers

Verification of neural networks is currently a hot topic in automated theorem proving. Progress has been rapid and there are now a wide range of tools available that can verify properties of networks with hundreds of thousands of nodes. In theory this opens the door to the verification of larger control systems that make use of neural network components. However, although work has managed to incorporate the results of these verifiers to prove larger properties of individual systems, there is currently no general methodology for bridging the gap between verifiers and interactive theorem provers (ITPs). In this paper we present Vehicle, our solution to this problem. Vehicle is equipped with an expressive domain specific language for stating neural network specifications which can be compiled to both verifiers and ITPs. It overcomes previous issues with maintainability and scalability in similar ITP formalisations by using a standard ONNX file as the single canonical representation of the network. We demonstrate its utility by using it to connect the neural network verifier Marabou to Agda and then formally verifying that a car steered by a neural network never leaves the road, even in the face of an unpredictable cross wind and imperfect sensors. The network has over 20,000 nodes, and therefore this proof represents an improvement of 3 orders of magnitude over prior proofs about neural network enhanced systems in ITPs.

preprint2022arXiv

Volume-Centred Range Bars: Novel Interpretable Representation of Financial Markets Designed for Machine Learning Applications

Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing-over-time properties, making its analysis a complex task. Hence, improvement of understanding and development of more informative, generalisable market representations are essential for the successful operation in financial markets, including risk assessment, diversification, trading, and order execution. In this study, we propose a volume-price-based market representation for making financial time series more suitable for machine learning pipelines. We use a statistical approach for evaluating the representation. Through the research questions, we investigate, i) whether the proposed representation allows the more efficient design of machine learning models; ii) whether the proposed representation leads to increased performance over the price levels market pattern; iii) whether the proposed representation performs better on the liquid markets, and iv) whether SHAP feature interactions are reliable to be used in the considered setting. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than the price levels-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models and compare the outcomes to those of the SHAP method. This results in the significant similarity between the two methods, hence we claim that SHAP feature interactions are reliable to be used in the setting of financial markets.

preprint2022arXiv

Why Robust Natural Language Understanding is a Challenge

With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in Computer Vision. In this paper we propose a Verification specification for Natural Language Understanding classification based on larger regions of interest, and we discuss the challenges of such task. We observe that, although the data is almost linearly separable, the verifier struggles to output positive results and we explain the problems and implications.

preprint2019arXiv

Generating Synthetic Data for Real World Detection of DoS Attacks in the IoT

Denial of service attacks are especially pertinent to the internet of things as devices have less computing power, memory and security mechanisms to defend against them. The task of mitigating these attacks must therefore be redirected from the device onto a network monitor. Network intrusion detection systems can be used as an effective and efficient technique in internet of things systems to offload computation from the devices and detect denial of service attacks before they can cause harm. However the solution of implementing a network intrusion detection system for internet of things networks is not without challenges due to the variability of these systems and specifically the difficulty in collecting data. We propose a model-hybrid approach to model the scale of the internet of things system and effectively train network intrusion detection systems. Through bespoke datasets generated by the model, the IDS is able to predict a wide spectrum of real-world attacks, and as demonstrated by an experiment construct more predictive datasets at a fraction of the time of other more standard techniques.

Luca Arnaboldi

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Automating Cryptographic Protocol Language Generation from Structured Specifications

Towards Interdependent Safety Security Assessments using Bowties

Vehicle: Interfacing Neural Network Verifiers with Interactive Theorem Provers

Volume-Centred Range Bars: Novel Interpretable Representation of Financial Markets Designed for Machine Learning Applications

Why Robust Natural Language Understanding is a Challenge

Generating Synthetic Data for Real World Detection of DoS Attacks in the IoT