Researcher profile

Mehrdad Sabetzadeh

Mehrdad Sabetzadeh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
11works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

DSL or Code? Evaluating the Quality of LLM-Generated Algebraic Specifications: A Case Study in Optimization at Kinaxis

Model-driven engineering (MDE) provides abstraction and analytical rigour, but industrial adoption in many domains has been limited by the cost of developing and maintaining models. Large language models (LLMs) can help shift this cost balance by supporting direct generation of models from natural-language (NL) descriptions. For domain-specific languages (DSLs), however, LLM-generated models may be less accurate than LLM-generated code in mainstream languages such as Python, due to the latter's dominance in LLM training corpora. We investigate this issue in mathematical optimization, with AMPL, a DSL with established industrial use. We introduce EXEOS, an LLM-based approach that derives AMPL models and Python code from NL problem descriptions and iteratively refines them with solver feedback. Using a public optimization dataset and real-world supply-chain cases from our industrial partner Kinaxis, we evaluate generated AMPL models against Python code in terms of executability and correctness. An ablation study with two LLM families shows that AMPL is competitive with, and sometimes better than, Python, and that our design choices in EXEOS improve the quality of generated specifications.

preprint2026arXiv

Question Answering for Multi-Release Systems: A Case Study at Ciena

Companies regularly have to contend with multi-release systems, where several versions of the same software are in operation simultaneously. Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation. Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR, a chatbot designed to answer questions across multi-release system documentation. QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases. It achieves this through a novel combination of pre-processing, query rewriting, and context selection. In addition, QAMR employs a dual-chunking strategy to enable separately tuned chunk sizes for retrieval and answer generation, improving overall question-answering accuracy. We evaluate QAMR using a public software-engineering benchmark as well as a collection of real-world, multi-release system documents from our industry partner, Ciena. Our evaluation yields five main findings: (1) QAMR outperforms a baseline RAG-based chatbot, achieving an average answer correctness of 88.5% and an average retrieval accuracy of 90%, which correspond to improvements of 16.5% and 12%, respectively. (2) An ablation study shows that QAMR's mechanisms for handling multi-release documents directly improve answer accuracy. (3) Compared to its component-ablated variants, QAMR achieves a 19.6% average gain in answer correctness and a 14.0% average gain in retrieval accuracy over the best ablation. (4) QAMR reduces response time by 8% on average relative to the baseline. (5) The automatically computed accuracy metrics used in our evaluation strongly correlate with expert human assessments, validating the reliability of our methodology.

preprint2022arXiv

A Domain-Specific Language for Simulation-Based Testing of IoT Edge-to-Cloud Solutions

The Internet of things (IoT) is increasingly prevalent in domains such as emergency response, smart cities and autonomous vehicles. Simulation plays a key role in the testing of IoT systems, noting that field testing of a complete IoT product may be infeasible or prohibitively expensive. In this paper, we propose a domain-specific language (DSL) for generating edge-to-cloud simulators. An edge-to-cloud simulator executes the functionality of a large array of edge devices that communicate with cloud applications. Our DSL, named IoTECS, is the result of a collaborative project with an IoT analytics company, Cheetah Networks. The industrial use case that motivates IoTECS is ensuring the scalability of cloud applications by putting them under extreme loads from IoT devices connected to the edge. We implement IoTECS using Xtext and empirically evaluate its usefulness. We further reflect on the lessons learned.

preprint2022arXiv

Learning Self-adaptations for IoT Networks: A Genetic Programming Approach

Internet of Things (IoT) is a pivotal technology in application domains that require connectivity and interoperability between large numbers of devices. IoT systems predominantly use a software-defined network (SDN) architecture as their core communication backbone. This architecture offers several advantages, including the flexibility to make IoT networks self-adaptive through software programmability. In general, self-adaptation solutions need to periodically monitor, reason about, and adapt a running system. The adaptation step involves generating an adaptation strategy and applying it to the running system whenever an anomaly arises. In this paper, we argue that, rather than generating individual adaptation strategies, the goal should be to adapt the logic / code of the running system in such a way that the system itself would learn how to steer clear of future anomalies, without triggering self-adaptation too frequently. We instantiate and empirically assess this idea in the context of IoT networks. Specifically, using genetic programming (GP), we propose a self-adaptation solution that continuously learns and updates the control constructs in the data-forwarding logic of SDN-based IoT networks. Our evaluation, performed using open-source synthetic and industrial data, indicates that, compared to a baseline adaptation technique that attempts to generate individual adaptations, our GP-based approach is more effective in resolving network congestion, and further, reduces the frequency of adaptation interventions over time. In addition, we compare our approach against a standard data-forwarding algorithm from the network literature, demonstrating that our approach significantly reduces packet loss.

preprint2022arXiv

TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting anaphora occurrences automatically. TAPHSIR employs a hybrid solution composed of an ambiguity detection solution based on machine learning and an anaphora resolution solution based on a variant of the BERT language model. Given a requirements specification, TAPHSIR decides for each pronoun occurrence in the specification whether the pronoun is ambiguous or unambiguous, and further provides an automatic interpretation for the pronoun. The output generated by TAPHSIR can be easily reviewed and validated by requirements engineers. TAPHSIR is publicly available on Zenodo (DOI: 10.5281/zenodo.5902117).

preprint2022arXiv

WikiDoMiner: Wikipedia Domain-specific Miner

We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (DOI: 10.5281/zenodo.6671357).

preprint2020arXiv

An Automated Framework for the Extraction of Semantic Legal Metadata from Legal Texts

Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97.2% and 82.4% and recall scores of 94.9% and 92.4%.

preprint2020arXiv

Dynamic Adaptation of Software-defined Networks for IoT Systems: A Search-based Approach

The concept of Internet of Things (IoT) has led to the development of many complex and critical systems such as smart emergency management systems. IoT-enabled applications typically depend on a communication network for transmitting large volumes of data in unpredictable and changing environments. These networks are prone to congestion when there is a burst in demand, e.g., as an emergency situation is unfolding, and therefore rely on configurable software-defined networks (SDN). In this paper, we propose a dynamic adaptive SDN configuration approach for IoT systems. The approach enables resolving congestion in real time while minimizing network utilization, data transmission delays and adaptation costs. Our approach builds on existing work in dynamic adaptive search-based software engineering (SBSE) to reconfigure an SDN while simultaneously ensuring multiple quality of service criteria. We evaluate our approach on an industrial national emergency management system, which is aimed at detecting disasters and emergencies, and facilitating recovery and rescue operations by providing first responders with a reliable communication infrastructure. Our results indicate that (1) our approach is able to efficiently and effectively adapt an SDN to dynamically resolve congestion, and (2) compared to two baseline data forwarding algorithms that are static and non-adaptive, our approach increases data transmission rate by a factor of at least 3 and decreases data loss by at least 70%.

preprint2020arXiv

Model Driven Engineering for Data Protection and Privacy: Application and Experience with GDPR

In Europe and indeed worldwide, the General Data Protection Regulation (GDPR) provides protection to individuals regarding their personal data in the face of new technological developments. GDPR is widely viewed as the benchmark for data protection and privacy regulations that harmonizes data privacy laws across Europe. Although the GDPR is highly beneficial to individuals, it presents significant challenges for organizations monitoring or storing personal information. Since there is currently no automated solution with broad industrial applicability, organizations have no choice but to carry out expensive manual audits to ensure GDPR compliance. In this paper, we present a complete GDPR UML model as a first step towards designing automated methods for checking GDPR compliance. Given that the practical application of the GDPR is influenced by national laws of the EU Member States, we suggest a two-tiered description of the GDPR, generic and specialized. In this paper, we provide (1) the GDPR conceptual model we developed with complete traceability from its classes to the GDPR, (2) a glossary to help understand the model, (3) the plain-English description of 35 compliance rules derived from GDPR along with their encoding in OCL, and (4) the set of 20 variations points derived from GDPR to specialize the generic model. We further present the challenges we faced in our modeling endeavor, the lessons we learned from it, and future directions for research.

preprint2020arXiv

On Systematically Building a Controlled Natural Language for Functional Requirements

[Context] Natural language (NL) is pervasive in software requirements specifications (SRSs). However, despite its popularity and widespread use, NL is highly prone to quality issues such as vagueness, ambiguity, and incompleteness. Controlled natural languages (CNLs) have been proposed as a way to prevent quality problems in requirements documents, while maintaining the flexibility to write and communicate requirements in an intuitive and universally understood manner. [Objective] In collaboration with an industrial partner from the financial domain, we systematically develop and evaluate a CNL, named Rimay, intended at helping analysts write functional requirements. [Method] We rely on Grounded Theory for building Rimay and follow well-known guidelines for conducting and reporting industrial case study research. [Results] Our main contributions are: (1) a qualitative methodology to systematically define a CNL for functional requirements; this methodology is general and applicable to information systems beyond the financial domain, (2) a CNL grammar to represent functional requirements; this grammar is derived from our experience in the financial domain, but should be applicable, possibly with adaptations, to other information-system domains, and (3) an empirical evaluation of our CNL (Rimay) through an industrial case study. Our contributions draw on 15 representative SRSs, collectively containing 3215 NL requirements statements from the financial domain. [Conclusion] Our evaluation shows that Rimay is expressive enough to capture, on average, 88% (405 out of 460) of the NL requirements statements in four previously unseen SRSs from the financial domain.

preprint2020arXiv

Practical Constraint Solving for Generating System Test Data

The ability to generate test data is often a necessary prerequisite for automated software testing. For the generated data to be fit for its intended purpose, the data usually has to satisfy various logical constraints. When testing is performed at a system level, these constraints tend to be complex and are typically captured in expressive formalisms based on first-order logic. Motivated by improving the feasibility and scalability of data generation for system testing, we present a novel approach, whereby we employ a combination of metaheuristic search and Satisfiability Modulo Theories (SMT) for constraint solving. Our approach delegates constraint solving tasks to metaheuristic search and SMT in such a way as to take advantage of the complementary strengths of the two techniques. We ground our work on test data models specified in UML, with OCL used as the constraint language. We present tool support and an evaluation of our approach over three industrial case studies. The results indicate that, for complex system test data generation problems, our approach presents substantial benefits over the state of the art in terms of applicability and scalability.