Source author record

Marlon Dumas

Marlon Dumas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Machine Learning Cryptography and Security Distributed, Parallel, and Cluster Computing Other Computer Science Performance Databases Information Theory math.IT physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

24works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs

Predictive Process Monitoring is a branch of process mining that aims to predict the outcome of an ongoing process. Recently, it leveraged machine-and-deep learning architectures. In this paper, we extend our prior LLM-based Predictive Process Monitoring framework, which was initially focused on total time prediction via prompting. The extension consists of comprehensively evaluating its generality, semantic leverage, and reasoning mechanisms, also across multiple Key Performance Indicators. Empirical evaluations conducted on three distinct event logs and across the Key Performance Indicators of Total Time and Activity Occurrence prediction indicate that, in data-scarce settings with only 100 traces, the LLM surpasses the benchmark methods. Furthermore, the experiments also show that the LLM exploits both its embodied prior knowledge and the internal correlations among training traces. Finally, we examine the reasoning strategies employed by the model, demonstrating that the LLM does not merely replicate existing predictive methods but performs higher-order reasoning to generate the predictions.

preprint2022arXiv

Business Process Simulation with Differentiated Resources: Does it Make a Difference?

Business process simulation is a versatile technique to predict the impact of one or more changes on the performance of a process. Mainstream approaches in this space suffer from various limitations, some stemming from the fact that they treat resources as undifferentiated entities grouped into resource pools. These approaches assume that all resources in a pool have the same performance and share the same availability calendars. Previous studies have acknowledged these assumptions, without quantifying their impact on simulation model accuracy. This paper addresses this gap in the context of simulation models automatically discovered from event logs. The paper proposes a simulation approach and a method for discovering simulation models, wherein each resource is treated as an individual entity, with its own performance and availability calendar. An evaluation shows that simulation models with differentiated resources more closely replicate the distributions of cycle times and the work rhythm in a process than models with undifferentiated resources.

preprint2022arXiv

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

Business process simulation is a well-known approach to estimate the impact of changes to a process with respect to time and cost measures -- a practice known as what-if process analysis. The usefulness of such estimations hinges on the accuracy of the underlying simulation model. Data-Driven Simulation (DDS) methods leverage process mining techniques to learn process simulation models from event logs. Empirical studies have shown that, while DDS models adequately capture the observed sequences of activities and their frequencies, they fail to accurately capture the temporal dynamics of real-life processes. In contrast, generative Deep Learning (DL) models are better able to capture such temporal dynamics. The drawback of DL models is that users cannot alter them for what-if analysis due to their black-box nature. This paper presents a hybrid approach to learn process simulation models from event logs wherein a (stochastic) process model is extracted via DDS techniques, and then combined with a DL model to generate timestamped event sequences. An experimental evaluation shows that the resulting hybrid simulation models match the temporal accuracy of pure DL models, while partially retaining the what-if analysis capability of DDS approaches.

preprint2022arXiv

Libra: High-Utility Anonymization of Event Logs for Process Mining via Subsampling

Process mining techniques enable analysts to identify and assess process improvement opportunities based on event logs. A common roadblock to process mining is that event logs may contain private information that cannot be used for analysis without consent. An approach to overcome this roadblock is to anonymize the event log so that no individual represented in the original log can be singled out based on the anonymized one. Differential privacy is an anonymization approach that provides this guarantee. A differentially private event log anonymization technique seeks to produce an anonymized log that is as similar as possible to the original one (high utility) while providing a required privacy guarantee. Existing event log anonymization techniques operate by injecting noise into the traces in the log (e.g., duplicating, perturbing, or filtering out some traces). Recent work on differential privacy has shown that a better privacy-utility tradeoff can be achieved by applying subsampling prior to noise injection. In other words, subsampling amplifies privacy. This paper proposes an event log anonymization approach called Libra that exploits this observation. Libra extracts multiple samples of traces from a log, independently injects noise, retains statistically relevant traces from each sample, and composes the samples to produce a differentially private log. An empirical evaluation shows that the proposed approach leads to a considerably higher utility for equivalent privacy guarantees relative to existing baselines.

preprint2022arXiv

Repairing Activity Start Times to Improve Business Process Simulation

Business Process Simulation (BPS) is a common technique to estimate the impact of business process changes, e.g. what would be the cycle time of a process if the number of traces increases? The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). Several studies have proposed methods to automatically discover BPS models from event logs -- extracted from enterprise information systems -- via process mining techniques. These approaches model the processing time of each activity based on the start and end timestamps recorded in the event log. In practice, however, it is common that the recorded start times do not precisely reflect the actual start of the activities. For example, a resource starts working on an activity, but its start time is not recorded until she/he interacts with the system. If not corrected, these situations induce waiting times in which the resource is considered to be free, while she/he is actually working. To address this limitation, this article proposes a technique to identify the waiting time previous to each activity instance in which the resource is actually working on them, and repair their start time so that they reflect the actual processing time. The idea of the proposed technique is that, as far as simulation is concerned, an activity instance may start once it is enabled and the corresponding resource is available. Accordingly, for each activity instance, the proposed technique estimates the activity enablement and the resource availability time based on the information available in the event log, and repairs the start time to include the non-recorded processing time. An empirical evaluation involving eight real-life event logs shows that the proposed approach leads to BPS models that closely reflect the temporal dynamics of the process.

preprint2022arXiv

When to intervene? Prescriptive Process Monitoring Under Uncertainty and Resource Constraints

Prescriptive process monitoring approaches leverage historical data to prescribe runtime interventions that will likely prevent negative case outcomes or improve a process's performance. A centerpiece of a prescriptive process monitoring method is its intervention policy: a decision function determining if and when to trigger an intervention on an ongoing case. Previous proposals in this field rely on intervention policies that consider only the current state of a given case. These approaches do not consider the tradeoff between triggering an intervention in the current state, given the level of uncertainty of the underlying predictive models, versus delaying the intervention to a later state. Moreover, they assume that a resource is always available to perform an intervention (infinite capacity). This paper addresses these gaps by introducing a prescriptive process monitoring method that filters and ranks ongoing cases based on prediction scores, prediction uncertainty, and causal effect of the intervention, and triggers interventions to maximize a gain function, considering the available resources. The proposal is evaluated using a real-life event log. The results show that the proposed method outperforms existing baselines regarding total gain.

preprint2020arXiv

Automated Discovery of Business Process Simulation Models from Event Logs

Business process simulation is a versatile technique to estimate the performance of a process under multiple scenarios. This, in turn, allows analysts to compare alternative options to improve a business process. A common roadblock for business process simulation is that constructing accurate simulation models is cumbersome and error-prone. Modern information systems store detailed execution logs of the business processes they support. Previous work has shown that these logs can be used to discover simulation models. However, existing methods for log-based discovery of simulation models do not seek to optimize the accuracy of the resulting models. Instead they leave it to the user to manually tune the simulation model to achieve the desired level of accuracy. This article presents an accuracy-optimized method to discover business process simulation models from execution logs. The method decomposes the problem into a series of steps with associated configuration parameters. A hyper-parameter optimization method is used to search through the space of possible configurations so as to maximize the similarity between the behavior of the simulation model and the behavior observed in the log. The method has been implemented as a tool and evaluated using logs from different domains.

preprint2020arXiv

Automated Discovery of Data Transformations for Robotic Process Automation

Robotic Process Automation (RPA) is a technology for automating repetitive routines consisting of sequences of user interactions with one or more applications. In order to fully exploit the opportunities opened by RPA, companies need to discover which specific routines may be automated, and how. In this setting, this paper addresses the problem of analyzing User Interaction (UI) logs in order to discover routines where a user transfers data from one spreadsheet or (Web) form to another. The paper maps this problem to that of discovering data transformations by example - a problem for which several techniques are available. The paper shows that a naive application of a state-of-the-art technique for data transformation discovery is computationally inefficient. Accordingly, the paper proposes two optimizations that take advantage of the information in the UI log and the fact that data transfers across applications typically involve copying alphabetic and numeric tokens separately. The proposed approach and its optimizations are evaluated using UI logs that replicate a real-life repetitive data transfer routine.

preprint2020arXiv

Detecting sudden and gradual drifts in business processes from execution traces

Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a family of methods to detect changes in a business process by analyzing event logs extracted from the systems that support the execution of the process. Existing methods for business process drift detection are based on an explorative analysis of a potentially large feature space and in some cases they require users to manually identify specific features that characterize the drift. Depending on the explored feature space, these methods miss various types of changes. Moreover, they are either designed to detect sudden drifts or gradual drifts but not both. This paper proposes an automated and statistically grounded method for detecting sudden and gradual business process drifts under a unified framework. An empirical evaluation shows that the method detects typical change patterns with significantly higher accuracy and lower detection delay than existing methods, while accurately distinguishing between sudden and gradual drifts.

preprint2020arXiv

Discovering Business Process Simulation Models in the Presence of Multitasking

Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, several authors have proposed to discover simulation models from process execution logs so that the resulting simulation models more closely match reality. Existing techniques in this field assume that each resource in the process performs one task at a time. In reality, however, resources may engage in multitasking behavior. Traditional simulation approaches do not handle multitasking. Instead, they rely on a resource allocation approach wherein a task instance is only assigned to a resource when the resource is free. This inability to handle multitasking leads to an overestimation of execution times. This paper proposes an approach to discover multitasking in business process execution logs and to generate a simulation model that takes into account the discovered multitasking behavior. The key idea is to adjust the processing times of tasks in such a way that executing the multitasked tasks sequentially with the adjusted times is equivalent to executing them concurrently with the original processing times. The proposed approach is evaluated using a real-life dataset and synthetic datasets with different levels of multitasking. The results show that, in the presence of multitasking, the approach improves the accuracy of simulation models discovered from execution logs.

preprint2020arXiv

Discovering Generative Models from Event Logs: Data-driven Simulation vs Deep Learning

A generative model is a statistical model that is able to generate new data instances from previously observed ones. In the context of business processes, a generative model creates new execution traces from a set of historical traces, also known as an event log. Two families of generative process simulation models have been developed in previous work: data-driven simulation models and deep learning models. Until now, these two approaches have evolved independently and their relative performance has not been studied. This paper fills this gap by empirically comparing a data-driven simulation technique with multiple deep learning techniques, which construct models are capable of generating execution traces with timestamped events. The study sheds light into the relative strengths of both approaches and raises the prospect of developing hybrid approaches that combine these strengths.

preprint2020arXiv

Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs

This paper proposes an approach to analyze an event log of a business process in order to generate case-level recommendations of treatments that maximize the probability of a given outcome. Users classify the attributes in the event log into controllable and non-controllable, where the former correspond to attributes that can be altered during an execution of the process (the possible treatments). We use an action rule mining technique to identify treatments that co-occur with the outcome under some conditions. Since action rules are generated based on correlation rather than causation, we then use a causal machine learning technique, specifically uplift trees, to discover subgroups of cases for which a treatment has a high causal effect on the outcome after adjusting for confounding variables. We test the relevance of this approach using an event log of a loan application process and compare our findings with recommendations manually produced by process mining experts.

preprint2020arXiv

Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones.

preprint2020arXiv

Secure Multi-Party Computation for Inter-Organizational Process Mining

Process mining is a family of techniques for analysing business processes based on event logs extracted from information systems. Mainstream process mining tools are designed for intra-organizational settings, insofar as they assume that an event log is available for processing as a whole. The use of such tools for inter-organizational process analysis is hampered by the fact that such processes involve independent parties who are unwilling to, or sometimes legally prevented from, sharing detailed event logs with each other. In this setting, this paper proposes an approach for constructing and querying a common type of artifact used for process mining, namely the frequency and time-annotated Directly-Follows Graph (DFG), over multiple event logs belonging to different parties, in such a way that the parties do not share the event logs with each other. The proposal leverages an existing platform for secure multi-party computation, namely Sharemind. Since a direct implementation of DFG construction in Sharemind suffers from scalability issues, the paper proposes to rely on vectorization of event logs and to employ a divide-and-conquer scheme for parallel processing of sub-logs. The paper reports on an experimental evaluation that tests the scalability of the approach on real-life logs.

preprint2016arXiv

Business Process Deviance Mining: Review and Evaluation

Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to its expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. This article provides a systematic review and comparative evaluation of deviance mining approaches based on a family of data mining techniques known as sequence classification. Using real-life logs from multiple domains, we evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions of a process. We also analyze the interestingness of the rule sets extracted using different methods. We observe that feature sets extracted using pattern mining techniques only slightly outperform simpler feature sets based on counts of individual activity occurrences in a trace.

preprint2016arXiv

Modelling Families of Business Process Variants: A Decomposition Driven Method

Business processes usually do not exist as singular entities that can be managed in isolation, but rather as families of business process variants. When modelling such families of variants, analysts are confronted with the choice between modelling each variant separately, or modelling multiple or all variants in a single model. Modelling each variant separately leads to a proliferation of models that share common parts, resulting in redundancies and inconsistencies. Meanwhile, modelling all variants together leads to less but more complex models, thus hindering on comprehensibility. This paper introduces a method for modelling families of process variants that addresses this trade-off. The key tenet of the method is to alternate between steps of decomposition (breaking down processes into sub-processes) and deciding which parts should be modelled together and which ones should be modelled separately. We have applied the method to two case studies: one concerning the consolidation of ex-isting process models, and another dealing with green-field process discovery. In both cases, the method produced fewer models with respect to the baseline and reduced duplicity by up to 50% without significant impact on complexity.

preprint2016arXiv

Semantics and Analysis of DMN Decision Tables

The Decision Model and Notation (DMN) is a standard notation to capture decision logic in business applications in general and business processes in particular. A central construct in DMN is that of a decision table. The increasing use of DMN decision tables to capture critical business knowledge raises the need to support analysis tasks on these tables such as correctness and completeness checking. This paper provides a formal semantics for DMN tables, a formal definition of key analysis tasks and scalable algorithms to tackle two such tasks, i.e., detection of overlapping rules and of missing rules. The algorithms are based on a geometric interpretation of decision tables that can be used to support other analysis tasks by tapping into geometric algorithms. The algorithms have been implemented in an open-source DMN editor and tested on large decision tables derived from a credit lending dataset.

preprint2015arXiv

Browserbite: Cross-Browser Testing via Image Processing

Cross-browser compatibility testing is concerned with identifying perceptible differences in the way a Web page is rendered across different browsers or configurations thereof. Existing automated cross-browser compatibility testing methods are generally based on Document Object Model (DOM) analysis, or in some cases, a combination of DOM analysis with screenshot capture and image processing. DOM analysis however may miss incompatibilities that arise not during DOM construction, but rather during rendering. Conversely, DOM analysis produces false alarms because different DOMs may lead to identical or sufficiently similar renderings. This paper presents a novel method for cross-browser testing based purely on image processing. The method relies on image segmentation to extract regions from a Web page and computer vision techniques to extract a set of characteristic features from each region. Regions extracted from a screenshot taken on a baseline browser are compared against regions extracted from the browser under test based on characteristic features. A machine learning classifier is used to determine if differences between two matched regions should be classified as an incompatibility. An evaluation involving 140 pages shows that the proposed method achieves an F-score exceeding 0.9, outperforming a state-of-the-art cross-browser testing tool based on DOM analysis.

preprint2015arXiv

Clustering-Based Predictive Process Monitoring

Business process enactment is generally supported by information systems that record data about process executions, which can be extracted as event logs. Predictive process monitoring is concerned with exploiting such event logs to predict how running (uncompleted) cases will unfold up to their completion. In this paper, we propose a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case. The predicate can be, for example, a temporal logic constraint or a time constraint, or any predicate that can be evaluated over a completed trace. The framework takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events. The prediction problem is approached in two phases. First, prefixes of previous traces are clustered according to control flow information. Secondly, a classifier is built for each cluster using event data to discriminate between fulfillments and violations. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classifier. The framework has been implemented in the ProM toolset and validated on a log pertaining to the treatment of cancer patients in a large hospital.

preprint2013arXiv

Artifact Lifecycle Discovery

Artifact-centric modeling is a promising approach for modeling business processes based on the so-called business artifacts - key entities driving the company's operations and whose lifecycles define the overall business process. While artifact-centric modeling shows significant advantages, the overwhelming majority of existing process mining methods cannot be applied (directly) as they are tailored to discover monolithic process models. This paper addresses the problem by proposing a chain of methods that can be applied to discover artifact lifecycle models in Guard-Stage-Milestone notation. We decompose the problem in such a way that a wide range of existing (non-artifact-centric) process discovery and analysis methods can be reused in a flexible manner. The methods presented in this paper are implemented as software plug-ins for ProM, a generic open-source framework and architecture for implementing process mining tools.

preprint2013arXiv

Bursty egocentric network evolution in Skype

In this study we analyze the dynamics of the contact list evolution of millions of users of the Skype communication network. We find that egocentric networks evolve heterogeneously in time as events of edge additions and deletions of individuals are grouped in long bursty clusters, which are separated by long inactive periods. We classify users by their link creation dynamics and show that bursty peaks of contact additions are likely to appear shortly after user account creation. We also study possible relations between bursty contact addition activity and other user-initiated actions like free and paid service adoption events. We show that bursts of contact additions are associated with increases in activity and adoption - an observation that can inform the design of targeted marketing tactics.

preprint2012arXiv

Squeezing out the Cloud via Profit-Maximizing Resource Allocation Policies

We study the problem of maximizing the average hourly profit earned by a Software-as-a-Service (SaaS) provider who runs a software service on behalf of a customer using servers rented from an Infrastructure-as-a-Service (IaaS) provider. The SaaS provider earns a fee per successful transaction and incurs costs proportional to the number of server-hours it uses. A number of resource allocation policies for this or similar problems have been proposed in previous work. However, to the best of our knowledge, these policies have not been comparatively evaluated in a cloud environment. This paper reports on an empirical evaluation of three policies using a replica of Wikipedia deployed on the Amazon EC2 cloud. Experimental results show that a policy based on a solution to an optimization problem derived from the SaaS provider's utility function outperforms well-known heuristics that have been proposed for similar problems. It is also shown that all three policies outperform a "reactive" allocation approach based on Amazon's auto-scaling feature.

preprint2011arXiv

Reserved or On-Demand Instances? A Revenue Maximization Model for Cloud Providers

We examine the problem of managing a server farm in a way that attempts to maximize the net revenue earned by a cloud provider by renting servers to customers according to a typical Platform-as-a-Service model. The Cloud provider offers its resources to two classes of customers: `premium' and `basic'. Premium customers pay upfront fees to reserve servers for a specified period of time (e.g. a year). Premium customers can submit jobs for their reserved servers at any time and pay a fee for the server-hours they use. The provider is liable to pay a penalty every time a `premium' job can not be executed due to lack of resources. On the other hand, `basic' customers are served on a best-effort basis, and pay a server-hour fee that may be higher than the one paid by premium customers. The provider incurs energy costs when running servers. Hence, it has an incentive to turn off idle servers. The question of how to choose the number of servers to allocate to each pool (basic and premium) is answered by analyzing a suitable queuing model and maximizing a revenue function. Experimental results show that the proposed scheme adapts to different traffic conditions, penalty levels, energy costs and usage fees.

preprint2010arXiv

Predicting Coding Effort in Projects Containing XML Code

This paper studies the problem of predicting the coding effort for a subsequent year of development by analysing metrics extracted from project repositories, with an emphasis on projects containing XML code. The study considers thirteen open source projects and applies machine learning algorithms to generate models to predict one-year coding effort, measured in terms of lines of code added, modified and deleted. Both organisational and code metrics associated to revisions are taken into account. The results show that coding effort is highly determined by the expertise of developers while source code metrics have little effect on improving the accuracy of estimations of coding effort. The study also shows that models trained on one project are unreliable at estimating effort in other projects.

Marlon Dumas

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs

Business Process Simulation with Differentiated Resources: Does it Make a Difference?

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

Libra: High-Utility Anonymization of Event Logs for Process Mining via Subsampling

Repairing Activity Start Times to Improve Business Process Simulation

When to intervene? Prescriptive Process Monitoring Under Uncertainty and Resource Constraints

Automated Discovery of Business Process Simulation Models from Event Logs

Automated Discovery of Data Transformations for Robotic Process Automation

Detecting sudden and gradual drifts in business processes from execution traces

Discovering Business Process Simulation Models in the Presence of Multitasking

Discovering Generative Models from Event Logs: Data-driven Simulation vs Deep Learning

Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs

Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Secure Multi-Party Computation for Inter-Organizational Process Mining

Business Process Deviance Mining: Review and Evaluation

Modelling Families of Business Process Variants: A Decomposition Driven Method

Semantics and Analysis of DMN Decision Tables

Browserbite: Cross-Browser Testing via Image Processing

Clustering-Based Predictive Process Monitoring

Artifact Lifecycle Discovery

Bursty egocentric network evolution in Skype

Squeezing out the Cloud via Profit-Maximizing Resource Allocation Policies

Reserved or On-Demand Instances? A Revenue Maximization Model for Cloud Providers

Predicting Coding Effort in Projects Containing XML Code