Researcher profile

Yannis Tzitzikas

Yannis Tzitzikas contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

It's not the Language Model, it's the Tool: Deterministic Mediation for Scientific Workflows

Language models can produce convincing scientific analyses, but repeated generations on the same data do not guarantee the same result. A researcher may regenerate an identical query and receive a different fit, a different peak position or a different analysis procedure, without an obvious way to decide which output to trust. We propose typed mediation, a pattern in which the model orchestrates deterministic tools rather than generating analytical code. Each tool encodes one researcher's exact procedure for one instrument, ported through structured interviews. The model selects which tool to call and with what parameters. The tool produces the result. Regeneration does not change it. We evaluate this claim by running the same photoluminescence analysis on four platforms, including three commercial foundation models, four times each with the same prompt. The typed tool produces identical results across all runs. The commercial platforms either vary in numerical output and analytical methodology across runs, or fail to produce valid results on the task. We deploy this pattern on two instruments serving users over approximately six months, with very positive user feedback. Both cases are very challenging: they involve proprietary binary formats and per-seat licensed software, which force the tool to remain on local infrastructure alongside the data and the instrument it operates. We argue that deployment topology is not just a preference, but a structural requirement of scientific tool mediation. The result is a practical pattern for deploying language models in scientific workflows where reproducibility is mandatory, reducing analysis time from weeks to minutes while guaranteeing identical outputs across runs.

preprint2021arXiv

CS563-QA: A Collection for Evaluating Question Answering Systems

Question Answering (QA) is a challenging topic since it requires tackling the various difficulties of natural language understanding. Since evaluation is important not only for identifying the strong and weak points of the various techniques for QA, but also for facilitating the inception of new methods and techniques, in this paper we present a collection for evaluating QA methods over free text that we have created. Although it is a small collection, it contains cases of increasing difficulty, therefore it has an educational value and it can be used for rapid evaluation of QA systems.

preprint2012arXiv

Information Carriers and Identification of Information Objects: An Ontological Approach

Even though library and archival practice, as well as Digital Preservation, have a long tradition in identifying information objects, the question of their precise identity under change of carrier or migration is still a riddle to science. The objective of this paper is to provide criteria for the unique identification of some important kinds of information objects, independent from the kind of carrier or specific encoding. Our approach is based on the idea that the substance of some kinds of information objects can completely be described in terms of discrete arrangements of finite numbers of known kinds of symbols, such as those implied by style guides for scientific journal submissions. Our theory is also useful for selecting or describing what has to be preserved. This is a fundamental problem since curators and archivists would like to formally record the decisions of what has to be preserved over time and to decide (or verify) whether a migration (transformation) preserves the intended information content. Furthermore, it is important for reasoning about the authenticity of digital objects, as well as for reducing the cost of digital preservation.

preprint2011arXiv

Query processing in distributed, taxonomy-based information sources

We address the problem of answering queries over a distributed information system, storing objects indexed by terms organized in a taxonomy. The taxonomy consists of subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of terms. In the first part of the paper, we consider the centralized case, deriving a hypergraph-based algorithm that is efficient in data complexity. In the second part of the paper, we consider the distributed case, presenting alternative ways implementing the centralized algorithm. These ways descend from two basic criteria: direct vs. query re-writing evaluation, and centralized vs. distributed data or taxonomy allocation. Combinations of these criteria allow to cover a wide spectrum of architectures, ranging from client-server to peer-to-peer. We evaluate the performance of the various architectures by simulation on a network with O(10^4) nodes, and derive final results. An extensive review of the relevant literature is finally included.

preprint2011arXiv

Similarity-based Browsing over Linked Open Data

An increasing amount of data is published on the Web according to the Linked Open Data (LOD) principles. End users would like to browse these data in a flexible manner. In this paper we focus on similarity-based browsing and we introduce a novel method for computing the similarity between two entities of a given RDF/S graph. The distinctive characteristics of the proposed metric is that it is generic (it can be used to compare nodes of any kind), it takes into account the neighborhoods of the nodes, and it is configurable (with respect to the accuracy vs computational complexity tradeoff). We demonstrate the behavior of the metric using examples from an application over LOD. Finally, we generalize and elaborate on implementation approaches harmonized with the distributed nature of LOD which can be used for computing the most similar entities using neighborhood-based similarity metrics.