Researcher profile

Michael Mathioudakis

Michael Mathioudakis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke

While digitized corpora have transformed the study of intellectual transmission, current methods rely heavily on lexical text reuse detection, capturing verbatim quotations but fundamentally missing paraphrases and complex implicit engagement. This paper evaluates semantic search in 18th-century intellectual history through the reception of John Locke's foundational work. Using expert annotation grounded in a semantic taxonomy, we examine whether an off-the-shelf semantic search pipeline can surface meaning-level correspondences overlooked by lexical methods. Our results demonstrate that semantic search retrieves substantially more implicit receptions than lexical baselines. However, linguistic diagnostics also reveal a "lexical gatekeeping" effect, where retrieval remains partially constrained by surface vocabulary overlap. These findings highlight both the potential and the limitations of semantic retrieval for analyzing the circulation of ideas in large historical corpora. The data is available at https://github.com/COMHIS/locke-sim-data.

preprint2021arXiv

Affirmative Action Policies for Top-k Candidates Selection, With an Application to the Design of Policies for University Admissions

We consider the problem of designing affirmative action policies for selecting the top-k candidates from a pool of applicants. We assume that for each candidate we have socio-demographic attributes and a series of variables that serve as indicators of future performance (e.g., results on standardized tests). We further assume that we have access to historical data including the actual performance of previously selected candidates. Critically, performance information is only available for candidates who were selected under some previous selection policy. In this work we assume that due to legal requirements or voluntary commitments, an organization wants to increase the presence of people from disadvantaged socio-demographic groups among the selected candidates. Hence, we seek to design an affirmative action or positive action policy. This policy has two concurrent objectives: (i) to select candidates who, given what can be learnt from historical data, are more likely to perform well, and (ii) to select candidates in a way that increases the representation of disadvantaged socio-demographic groups. Our motivating application is the design of university admission policies to bachelor's degrees. We use a causal model as a framework to describe several families of policies (changing component weights, giving bonuses, and enacting quotas), and compare them both theoretically and through extensive experimentation on a large real-world dataset containing thousands of university applicants. Our paper is the first to place the problem of affirmative-action policy design within the framework of algorithmic fairness. Our empirical results indicate that simple policies could favor the admission of disadvantaged groups without significantly compromising on the quality of accepted candidates.

preprint2021arXiv

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint $k$. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional \emph{fairness} constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a $ (\frac{1}{2}-\varepsilon) $-approximation algorithm that requires $ O(\frac{1}{\varepsilon} \log \frac{k}{\varepsilon}) $ passes over the stream for any constant $ \varepsilon>0 $. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of $(\frac{1}{2}-\varepsilon)$ when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely \emph{maximum coverage on large graphs} and \emph{personalized recommendation}.

preprint2021arXiv

Intersectional Affirmative Action Policies for Top-k Candidates Selection

We study the problem of selecting the top-k candidates from a pool of applicants, where each candidate is associated with a score indicating his/her aptitude. Depending on the specific scenario, such as job search or college admissions, these scores may be the results of standardized tests or other predictors of future performance and utility. We consider a situation in which some groups of candidates experience historical and present disadvantage that makes their chances of being accepted much lower than other groups. In these circumstances, we wish to apply an affirmative action policy to reduce acceptance rate disparities, while avoiding any large decrease in the aptitude of the candidates that are eventually selected. Our algorithmic design is motivated by the frequently observed phenomenon that discrimination disproportionately affects individuals who simultaneously belong to multiple disadvantaged groups, defined along intersecting dimensions such as gender, race, sexual orientation, socio-economic status, and disability. In short, our algorithm's objective is to simultaneously: select candidates with high utility, and level up the representation of disadvantaged intersectional classes. This naturally involves trade-offs and is computationally challenging due to the the combinatorial explosion of potential subgroups as more attributes are considered. We propose two algorithms to solve this problem, analyze them, and evaluate them experimentally using a dataset of university application scores and admissions to bachelor degrees in an OECD country. Our conclusion is that it is possible to significantly reduce disparities in admission rates affecting intersectional classes with a small loss in terms of selected candidate aptitude. To the best of our knowledge, we are the first to study fairness constraints with regards to intersectional classes in the context of top-k selection.

preprint2021arXiv

Query the model: precomputations for efficient inference with Bayesian Networks

Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method for Variable Elimination, which can lead to significant efficiency gains when answering inference queries. We evaluate our technique using real-world Bayesian networks. Our results show that a modest amount of materialization can lead to significant improvements in the running time of queries. Furthermore, in comparison with junction tree methods that also rely on materialization, our approach achieves comparable efficiency during inference using significantly lighter materialization.

preprint2020arXiv

GRMR: Generalized Regret-Minimizing Representatives

Extracting a small subset of representative tuples from a large database is an important task in multi-criteria decision making. The regret-minimizing set (RMS) problem is recently proposed for representative discovery from databases. Specifically, for a set of tuples (points) in $d$ dimensions, an RMS problem finds the smallest subset such that, for any possible ranking function, the relative difference in scores between the top-ranked point in the subset and the top-ranked point in the entire database is within a parameter $\varepsilon \in (0,1)$. Although RMS and its variations have been extensively investigated in the literature, existing approaches only consider the class of nonnegative (monotonic) linear functions for ranking, which have limitations in modeling user preferences and decision-making processes. To address this issue, we define the generalized regret-minimizing representative (GRMR) problem that extends RMS by taking into account all linear functions including non-monotonic ones with negative weights. For two-dimensional databases, we propose an optimal algorithm for GRMR via a transformation into the shortest cycle problem in a directed graph. Since GRMR is proven to be NP-hard even in three dimensions, we further develop a polynomial-time heuristic algorithm for GRMR on databases in arbitrary dimensions. Finally, we conduct extensive experiments on real and synthetic datasets to confirm the efficiency, effectiveness, and scalability of our proposed algorithms.

preprint2020arXiv

Towards Data-Driven Affirmative Action Policies under Uncertainty

In this paper, we study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. We consider affirmative action policies that seek to increase the number of admitted applicants from underrepresented groups. Since such a policy has to be announced before the start of the application period, there is uncertainty about the score distribution of the students applying to each program. This poses a difficult challenge for policy-makers. We explore the possibility of using a predictive model trained on historical data to help optimize the parameters of such policies.

preprint2018arXiv

Markov Chain Monitoring

In networking applications, one often wishes to obtain estimates about the number of objects at different parts of the network (e.g., the number of cars at an intersection of a road network or the number of packets expected to reach a node in a computer network) by monitoring the traffic in a small number of network nodes or edges. We formalize this task by defining the 'Markov Chain Monitoring' problem. Given an initial distribution of items over the nodes of a Markov chain, we wish to estimate the distribution of items at subsequent times. We do this by asking a limited number of queries that retrieve, for example, how many items transitioned to a specific node or over a specific edge at a particular time. We consider different types of queries, each defining a different variant of the Markov chain monitoring. For each variant, we design efficient algorithms for choosing the queries that make our estimates as accurate as possible. In our experiments with synthetic and real datasets we demonstrate the efficiency and the efficacy of our algorithms in a variety of settings.