Source author record

Mohammad Reza Mousavi

Mohammad Reza Mousavi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Logic in Computer Science Software Engineering Artificial Intelligence Programming Languages Systems and Control

Catalog footprint

What is connected

11works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

(How) Do Large Language Models Understand High-Level Message Sequence Charts?

Large Language Models (LLMs) are being employed widely to automate tasks across the software development life-cycle. It is, however, unclear whether these tasks are performed consistently with respect to the semantics of the artefacts being handled. This question is particularly under-researched concerning architectural design specification. In this paper, we address this question for High-Level Message Sequence Charts (HMSCs). These are visual models with a rigorous formal semantics that have been used for various purposes, including as a foundation for Sequence Diagrams in the Unified Modelling Language (UML). We examine whether LLMs "understand" the semantics of HMSCs by examining three LLMs (Gemini-3, GPT-5.4, and Qwen-3.6) on how they perform 129 semantic tasks ranging from querying basic semantic constructs in HMSCs (i.e., events and their ordering) to semantic-preserving abstractions and compositions, and calculating the set of traces and trace-equivalent labelled transition systems. The results show that LLMs only have a modest understanding of the formal semantics of HMSCs (ca. 52% overall accuracy), with great variability across different semantic concepts: while LLMs seem to understand the basic semantic concepts of MSCs (ca. 88% accuracy), they struggle with semantic reasoning in tasks involving abstraction and composition (ca. 36% accuracy) and traces and LTSs (ca. 42% accuracy). In particular, all three LLMs struggle with the notions of co-region and explicit causal dependencies and never employed them in semantic-preserving transformations.

preprint2026arXiv

FairMedQA: Benchmarking Bias in Large Language Models for Medical Question Answering

Large language models (LLMs) are approaching expert-level performance in medical question answering (QA), demonstrating strong potential to improve public healthcare. However, underlying biases related to sensitive attributes such as sex and race pose life-critical risks. The extent to which such sensitive attributes affect diagnosis remains an open question and requires comprehensive empirical investigation. Additionally, even the latest Counterfactual Patient Variations (CPV) benchmark can hardly distinguish the bias levels of different LLMs. To further explore these dynamics, we propose a new benchmark, FairMedQA, and benchmark 12 representative LLMs. FairMedQA contains 4,806 counterfactual question pairs constructed from 801 clinical vignettes. Our results reveal substantial accuracy disparity ranging from 3 to 19 percentage points across sensitive demographic groups. Notably, FairMedQA exposes biases that are at least 12 percentage points larger than those identified by the latest CPV benchmark, presenting superior benchmarking sensitivity. Our results underscore an urgent need for targeted debiasing techniques and more rigorous, identity-aware validation protocols before LLMs can be safely integrated into practical clinical decision-support systems.

preprint2022arXiv

A Benchmark for Active Learning of Variability-Intensive Systems

Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their associated metrics will play a key role in shaping the research agenda in this promising field and provide an effective means for comparing and identifying relative strengths and weaknesses in the forthcoming techniques. In this challenge, we seek benchmarks to evaluate the efficiency (e.g., learning time and memory footprint) and effectiveness (e.g., conciseness and accuracy of family models) of active model learning methods in the software product line context. These benchmark sets must contain the structural and behavioral variability models of at least one SPL. Each SPL in a benchmark must contain products that requires more than one round of model learning with respect to the basic active learning $L^{*}$ algorithm. Alternatively, tools supporting the synthesis of artificial benchmark models are also welcome.

preprint2022arXiv

Adaptive Behavioral Model Learning for Software Product Lines

Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In this paper, an adaptive approach (the $\text{PL}^*$ method) for learning the product models of an SPL is presented based on the well-known $L^*$ algorithm. In this method, after model learning of each product, the sequences in the final observation table are stored in a repository which will be used to initialize the observation table of the remaining products to be learned. The proposed algorithm is evaluated on two open-source SPLs and the total learning cost is measured in terms of the number of rounds, the total number of resets and input symbols. The results show that for complex SPLs, the total learning cost for the $\text{PL}^*$ method is significantly lower than that of the non-adaptive learning method in terms of all three metrics. Furthermore, it is observed that the order in which the products are learned affects the efficiency of the $\text{PL}^*$ method. Based on this observation, we introduced a heuristic to determine an ordering which reduces the total cost of adaptive learning in both case studies.

preprint2016arXiv

(De-)Composing Causality in Labeled Transition Systems

In this paper we introduce a notion of counterfactual causality in the Halpern and Pearl sense that is compositional with respect to the interleaving of transition systems. The formal framework for reasoning on what caused the violation of a safety property is established in the context of labeled transition systems and Hennessy Milner logic. The compositionality results are devised for non-communicating systems.

preprint2016arXiv

Towards an Approximate Conformance Relation for Hybrid I/O Automata

Several notions of conformance have been proposed for checking the behavior of cyber-physical systems against their hybrid systems models. In this paper, we explore the initial idea of a notion of approximate conformance that allows for comparison of both observable discrete actions and (sampled) continuous trajectories. As such, this notion will consolidate two earlier notions, namely the notion of Hybrid Input-Output Conformance (HIOCO) by M. van Osch and the notion of Hybrid Conformance by H. Abbas and G.E. Fainekos. We prove that our proposed notion of conformance satisfies a semi-transitivity property, which makes it suitable for a step-wise proof of conformance or refinement.

preprint2014arXiv

Spinal Test Suites for Software Product Lines

A major challenge in testing software product lines is efficiency. In particular, testing a product line should take less effort than testing each and every product individually. We address this issue in the context of input-output conformance testing, which is a formal theory of model-based testing. We extend the notion of conformance testing on input-output featured transition systems with the novel concept of spinal test suites. We show how this concept dispenses with retesting the common behavior among different, but similar, products of a software product line.

preprint2013arXiv

Algebraic Meta-Theory of Processes with Data

There exists a rich literature of rule formats guaranteeing different algebraic properties for formalisms with a Structural Operational Semantics. Moreover, there exist a few approaches for automatically deriving axiomatizations characterizing strong bisimilarity of processes. To our knowledge, this literature has never been extended to the setting with data (e.g. to model storage and memory). We show how the rule formats for algebraic properties can be exploited in a generic manner in the setting with data. Moreover, we introduce a new approach for deriving sound and ground-complete axiom schemata for a notion of bisimilarity with data, called stateless bisimilarity, based on intuitive auxiliary function symbols for handling the store component. We do restrict, however, the axiomatization to the setting where the store component is only given in terms of constants.

preprint2013arXiv

Decomposability in Input Output Conformance Testing

We study the problem of deriving a specification for a third-party component, based on the specification of the system and the environment in which the component is supposed to reside. Particularly, we are interested in using component specifications for conformance testing of black-box components, using the theory of input-output conformance (ioco) testing. We propose and prove sufficient criteria for decompositionality, i.e., that components conforming to the derived specification will always compose to produce a correct system with respect to the system specification. We also study the criteria for strong decomposability, by which we can ensure that only those components conforming to the derived specification can lead to a correct system.

preprint2011arXiv

Proceedings 10th International Workshop on the Foundations of Coordination Languages and Software Architectures

Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components. To specify and reason about such systems and go beyond the functional correctness proofs, e.g., by supporting reusability and improving maintainability, approaches such as coordination languages and software architecture are recognised as fundamental. The goal of the this workshop is to put together researchers and practitioners of the aforementioned fields, to share and identify common problems, and to devise general solutions in the context of coordination languages and software architectures.

preprint2011arXiv

Proceedings First International Workshop on Process Algebra and Coordination

Process algebra provides abstract and rigorous means for studying communicating concurrent systems. Coordination languages also provide abstract means for the specifying and programming communication of components. Hence, the two fields seem to have very much in common and the link between these two research areas have been established formally by means of several translations, mainly from coordination languages to process algebras. There have also been proposals of process algebras whose communication policy is inspired by the one underlying coordination languages. The aim of this workshop was to push the state of the art in the study of the connections between process algebra and coordination languages by bringing together experts as well as young researchers from the two fields to communicate their ideas and findings. It includes both contributed and invited papers that have been presented during the one day meeting on Process Algebra and Coordination (PACO 2011) which took place on June 9, 2011 in Reykjavik, Iceland.

Mohammad Reza Mousavi

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

(How) Do Large Language Models Understand High-Level Message Sequence Charts?

FairMedQA: Benchmarking Bias in Large Language Models for Medical Question Answering

A Benchmark for Active Learning of Variability-Intensive Systems

Adaptive Behavioral Model Learning for Software Product Lines

(De-)Composing Causality in Labeled Transition Systems

Towards an Approximate Conformance Relation for Hybrid I/O Automata

Spinal Test Suites for Software Product Lines

Algebraic Meta-Theory of Processes with Data

Decomposability in Input Output Conformance Testing

Proceedings 10th International Workshop on the Foundations of Coordination Languages and Software Architectures

Proceedings First International Workshop on Process Algebra and Coordination