Researcher profile

Marlon Dumas

Marlon Dumas contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs

Predictive Process Monitoring is a branch of process mining that aims to predict the outcome of an ongoing process. Recently, it leveraged machine-and-deep learning architectures. In this paper, we extend our prior LLM-based Predictive Process Monitoring framework, which was initially focused on total time prediction via prompting. The extension consists of comprehensively evaluating its generality, semantic leverage, and reasoning mechanisms, also across multiple Key Performance Indicators. Empirical evaluations conducted on three distinct event logs and across the Key Performance Indicators of Total Time and Activity Occurrence prediction indicate that, in data-scarce settings with only 100 traces, the LLM surpasses the benchmark methods. Furthermore, the experiments also show that the LLM exploits both its embodied prior knowledge and the internal correlations among training traces. Finally, we examine the reasoning strategies employed by the model, demonstrating that the LLM does not merely replicate existing predictive methods but performs higher-order reasoning to generate the predictions.

preprint2022arXiv

Business Process Simulation with Differentiated Resources: Does it Make a Difference?

Business process simulation is a versatile technique to predict the impact of one or more changes on the performance of a process. Mainstream approaches in this space suffer from various limitations, some stemming from the fact that they treat resources as undifferentiated entities grouped into resource pools. These approaches assume that all resources in a pool have the same performance and share the same availability calendars. Previous studies have acknowledged these assumptions, without quantifying their impact on simulation model accuracy. This paper addresses this gap in the context of simulation models automatically discovered from event logs. The paper proposes a simulation approach and a method for discovering simulation models, wherein each resource is treated as an individual entity, with its own performance and availability calendar. An evaluation shows that simulation models with differentiated resources more closely replicate the distributions of cycle times and the work rhythm in a process than models with undifferentiated resources.

preprint2022arXiv

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

Business process simulation is a well-known approach to estimate the impact of changes to a process with respect to time and cost measures -- a practice known as what-if process analysis. The usefulness of such estimations hinges on the accuracy of the underlying simulation model. Data-Driven Simulation (DDS) methods leverage process mining techniques to learn process simulation models from event logs. Empirical studies have shown that, while DDS models adequately capture the observed sequences of activities and their frequencies, they fail to accurately capture the temporal dynamics of real-life processes. In contrast, generative Deep Learning (DL) models are better able to capture such temporal dynamics. The drawback of DL models is that users cannot alter them for what-if analysis due to their black-box nature. This paper presents a hybrid approach to learn process simulation models from event logs wherein a (stochastic) process model is extracted via DDS techniques, and then combined with a DL model to generate timestamped event sequences. An experimental evaluation shows that the resulting hybrid simulation models match the temporal accuracy of pure DL models, while partially retaining the what-if analysis capability of DDS approaches.

preprint2022arXiv

Libra: High-Utility Anonymization of Event Logs for Process Mining via Subsampling

Process mining techniques enable analysts to identify and assess process improvement opportunities based on event logs. A common roadblock to process mining is that event logs may contain private information that cannot be used for analysis without consent. An approach to overcome this roadblock is to anonymize the event log so that no individual represented in the original log can be singled out based on the anonymized one. Differential privacy is an anonymization approach that provides this guarantee. A differentially private event log anonymization technique seeks to produce an anonymized log that is as similar as possible to the original one (high utility) while providing a required privacy guarantee. Existing event log anonymization techniques operate by injecting noise into the traces in the log (e.g., duplicating, perturbing, or filtering out some traces). Recent work on differential privacy has shown that a better privacy-utility tradeoff can be achieved by applying subsampling prior to noise injection. In other words, subsampling amplifies privacy. This paper proposes an event log anonymization approach called Libra that exploits this observation. Libra extracts multiple samples of traces from a log, independently injects noise, retains statistically relevant traces from each sample, and composes the samples to produce a differentially private log. An empirical evaluation shows that the proposed approach leads to a considerably higher utility for equivalent privacy guarantees relative to existing baselines.

preprint2022arXiv

Repairing Activity Start Times to Improve Business Process Simulation

Business Process Simulation (BPS) is a common technique to estimate the impact of business process changes, e.g. what would be the cycle time of a process if the number of traces increases? The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). Several studies have proposed methods to automatically discover BPS models from event logs -- extracted from enterprise information systems -- via process mining techniques. These approaches model the processing time of each activity based on the start and end timestamps recorded in the event log. In practice, however, it is common that the recorded start times do not precisely reflect the actual start of the activities. For example, a resource starts working on an activity, but its start time is not recorded until she/he interacts with the system. If not corrected, these situations induce waiting times in which the resource is considered to be free, while she/he is actually working. To address this limitation, this article proposes a technique to identify the waiting time previous to each activity instance in which the resource is actually working on them, and repair their start time so that they reflect the actual processing time. The idea of the proposed technique is that, as far as simulation is concerned, an activity instance may start once it is enabled and the corresponding resource is available. Accordingly, for each activity instance, the proposed technique estimates the activity enablement and the resource availability time based on the information available in the event log, and repairs the start time to include the non-recorded processing time. An empirical evaluation involving eight real-life event logs shows that the proposed approach leads to BPS models that closely reflect the temporal dynamics of the process.

preprint2022arXiv

When to intervene? Prescriptive Process Monitoring Under Uncertainty and Resource Constraints

Prescriptive process monitoring approaches leverage historical data to prescribe runtime interventions that will likely prevent negative case outcomes or improve a process's performance. A centerpiece of a prescriptive process monitoring method is its intervention policy: a decision function determining if and when to trigger an intervention on an ongoing case. Previous proposals in this field rely on intervention policies that consider only the current state of a given case. These approaches do not consider the tradeoff between triggering an intervention in the current state, given the level of uncertainty of the underlying predictive models, versus delaying the intervention to a later state. Moreover, they assume that a resource is always available to perform an intervention (infinite capacity). This paper addresses these gaps by introducing a prescriptive process monitoring method that filters and ranks ongoing cases based on prediction scores, prediction uncertainty, and causal effect of the intervention, and triggers interventions to maximize a gain function, considering the available resources. The proposal is evaluated using a real-life event log. The results show that the proposed method outperforms existing baselines regarding total gain.

preprint2020arXiv

Automated Discovery of Business Process Simulation Models from Event Logs

Business process simulation is a versatile technique to estimate the performance of a process under multiple scenarios. This, in turn, allows analysts to compare alternative options to improve a business process. A common roadblock for business process simulation is that constructing accurate simulation models is cumbersome and error-prone. Modern information systems store detailed execution logs of the business processes they support. Previous work has shown that these logs can be used to discover simulation models. However, existing methods for log-based discovery of simulation models do not seek to optimize the accuracy of the resulting models. Instead they leave it to the user to manually tune the simulation model to achieve the desired level of accuracy. This article presents an accuracy-optimized method to discover business process simulation models from execution logs. The method decomposes the problem into a series of steps with associated configuration parameters. A hyper-parameter optimization method is used to search through the space of possible configurations so as to maximize the similarity between the behavior of the simulation model and the behavior observed in the log. The method has been implemented as a tool and evaluated using logs from different domains.

preprint2020arXiv

Automated Discovery of Data Transformations for Robotic Process Automation

Robotic Process Automation (RPA) is a technology for automating repetitive routines consisting of sequences of user interactions with one or more applications. In order to fully exploit the opportunities opened by RPA, companies need to discover which specific routines may be automated, and how. In this setting, this paper addresses the problem of analyzing User Interaction (UI) logs in order to discover routines where a user transfers data from one spreadsheet or (Web) form to another. The paper maps this problem to that of discovering data transformations by example - a problem for which several techniques are available. The paper shows that a naive application of a state-of-the-art technique for data transformation discovery is computationally inefficient. Accordingly, the paper proposes two optimizations that take advantage of the information in the UI log and the fact that data transfers across applications typically involve copying alphabetic and numeric tokens separately. The proposed approach and its optimizations are evaluated using UI logs that replicate a real-life repetitive data transfer routine.

preprint2020arXiv

Detecting sudden and gradual drifts in business processes from execution traces

Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a family of methods to detect changes in a business process by analyzing event logs extracted from the systems that support the execution of the process. Existing methods for business process drift detection are based on an explorative analysis of a potentially large feature space and in some cases they require users to manually identify specific features that characterize the drift. Depending on the explored feature space, these methods miss various types of changes. Moreover, they are either designed to detect sudden drifts or gradual drifts but not both. This paper proposes an automated and statistically grounded method for detecting sudden and gradual business process drifts under a unified framework. An empirical evaluation shows that the method detects typical change patterns with significantly higher accuracy and lower detection delay than existing methods, while accurately distinguishing between sudden and gradual drifts.

preprint2020arXiv

Discovering Business Process Simulation Models in the Presence of Multitasking

Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, several authors have proposed to discover simulation models from process execution logs so that the resulting simulation models more closely match reality. Existing techniques in this field assume that each resource in the process performs one task at a time. In reality, however, resources may engage in multitasking behavior. Traditional simulation approaches do not handle multitasking. Instead, they rely on a resource allocation approach wherein a task instance is only assigned to a resource when the resource is free. This inability to handle multitasking leads to an overestimation of execution times. This paper proposes an approach to discover multitasking in business process execution logs and to generate a simulation model that takes into account the discovered multitasking behavior. The key idea is to adjust the processing times of tasks in such a way that executing the multitasked tasks sequentially with the adjusted times is equivalent to executing them concurrently with the original processing times. The proposed approach is evaluated using a real-life dataset and synthetic datasets with different levels of multitasking. The results show that, in the presence of multitasking, the approach improves the accuracy of simulation models discovered from execution logs.

preprint2020arXiv

Discovering Generative Models from Event Logs: Data-driven Simulation vs Deep Learning

A generative model is a statistical model that is able to generate new data instances from previously observed ones. In the context of business processes, a generative model creates new execution traces from a set of historical traces, also known as an event log. Two families of generative process simulation models have been developed in previous work: data-driven simulation models and deep learning models. Until now, these two approaches have evolved independently and their relative performance has not been studied. This paper fills this gap by empirically comparing a data-driven simulation technique with multiple deep learning techniques, which construct models are capable of generating execution traces with timestamped events. The study sheds light into the relative strengths of both approaches and raises the prospect of developing hybrid approaches that combine these strengths.

preprint2020arXiv

Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs

This paper proposes an approach to analyze an event log of a business process in order to generate case-level recommendations of treatments that maximize the probability of a given outcome. Users classify the attributes in the event log into controllable and non-controllable, where the former correspond to attributes that can be altered during an execution of the process (the possible treatments). We use an action rule mining technique to identify treatments that co-occur with the outcome under some conditions. Since action rules are generated based on correlation rather than causation, we then use a causal machine learning technique, specifically uplift trees, to discover subgroups of cases for which a treatment has a high causal effect on the outcome after adjusting for confounding variables. We test the relevance of this approach using an event log of a loan application process and compare our findings with recommendations manually produced by process mining experts.

preprint2020arXiv

Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones.

preprint2020arXiv

Secure Multi-Party Computation for Inter-Organizational Process Mining

Process mining is a family of techniques for analysing business processes based on event logs extracted from information systems. Mainstream process mining tools are designed for intra-organizational settings, insofar as they assume that an event log is available for processing as a whole. The use of such tools for inter-organizational process analysis is hampered by the fact that such processes involve independent parties who are unwilling to, or sometimes legally prevented from, sharing detailed event logs with each other. In this setting, this paper proposes an approach for constructing and querying a common type of artifact used for process mining, namely the frequency and time-annotated Directly-Follows Graph (DFG), over multiple event logs belonging to different parties, in such a way that the parties do not share the event logs with each other. The proposal leverages an existing platform for secure multi-party computation, namely Sharemind. Since a direct implementation of DFG construction in Sharemind suffers from scalability issues, the paper proposes to rely on vectorization of event logs and to employ a divide-and-conquer scheme for parallel processing of sub-logs. The paper reports on an experimental evaluation that tests the scalability of the approach on real-life logs.