Source author record

Yannis Tzitzikas

Yannis Tzitzikas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Databases Digital Libraries Computation and Language Distributed, Parallel, and Cluster Computing Information Retrieval Sound

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

It's not the Language Model, it's the Tool: Deterministic Mediation for Scientific Workflows

Language models can produce convincing scientific analyses, but repeated generations on the same data do not guarantee the same result. A researcher may regenerate an identical query and receive a different fit, a different peak position or a different analysis procedure, without an obvious way to decide which output to trust. We propose typed mediation, a pattern in which the model orchestrates deterministic tools rather than generating analytical code. Each tool encodes one researcher's exact procedure for one instrument, ported through structured interviews. The model selects which tool to call and with what parameters. The tool produces the result. Regeneration does not change it. We evaluate this claim by running the same photoluminescence analysis on four platforms, including three commercial foundation models, four times each with the same prompt. The typed tool produces identical results across all runs. The commercial platforms either vary in numerical output and analytical methodology across runs, or fail to produce valid results on the task. We deploy this pattern on two instruments serving users over approximately six months, with very positive user feedback. Both cases are very challenging: they involve proprietary binary formats and per-seat licensed software, which force the tool to remain on local infrastructure alongside the data and the instrument it operates. We argue that deployment topology is not just a preference, but a structural requirement of scientific tool mediation. The result is a practical pattern for deploying language models in scientific workflows where reproducibility is mandatory, reducing analysis time from weeks to minutes while guaranteeing identical outputs across runs.

preprint2021arXiv

CS563-QA: A Collection for Evaluating Question Answering Systems

Question Answering (QA) is a challenging topic since it requires tackling the various difficulties of natural language understanding. Since evaluation is important not only for identifying the strong and weak points of the various techniques for QA, but also for facilitating the inception of new methods and techniques, in this paper we present a collection for evaluating QA methods over free text that we have created. Although it is a small collection, it contains cases of increasing difficulty, therefore it has an educational value and it can be used for rapid evaluation of QA systems.

preprint2014arXiv

Tasks that Require, or can Benefit from, Matching Blank Nodes

In various domains and cases, we observe the creation and usage of information elements which are unnamed. Such elements do not have a name, or may have a name that is not externally referable (usually meaningless and not persistent over time). This paper discusses why we will never `escape' from the problem of having to construct mappings between such unnamed elements in information systems. Since unnamed elements nowadays occur very often in the framework of the Semantic Web and Linked Data as blank nodes, the paper describes scenarios that can benefit from methods that compute mappings between the unnamed elements. For each scenario, the corresponding bnode matching problem is formally defined. Based on this analysis, we try to reach to more a general formulation of the problem, which can be useful for guiding the required technological advances. To this end, the paper finally discusses methods to realize blank node matching, the implementations that exist, and identifies open issues and challenges.

preprint2013arXiv

A Simple Method to Produce Algorithmic MIDI Music based on Randomness, Simple Probabilities and Multi-Threading

This paper introduces a simple method for producing multichannel MIDI music that is based on randomness and simple probabilities. One distinctive feature of the method is that it produces and sends in parallel to the sound card more than one unsynchronized channels by exploiting the multi-threading capabilities of general purpose programming languages. As consequence the derived sound offers a quite ``full" and ``unpredictable" acoustic experience to the listener. Subsequently the paper reports the results of an evaluation with users. The results were very surprising: the majority of users responded that they could tolerate this music in various occasions.

preprint2012arXiv

Information Carriers and Identification of Information Objects: An Ontological Approach

Even though library and archival practice, as well as Digital Preservation, have a long tradition in identifying information objects, the question of their precise identity under change of carrier or migration is still a riddle to science. The objective of this paper is to provide criteria for the unique identification of some important kinds of information objects, independent from the kind of carrier or specific encoding. Our approach is based on the idea that the substance of some kinds of information objects can completely be described in terms of discrete arrangements of finite numbers of known kinds of symbols, such as those implied by style guides for scientific journal submissions. Our theory is also useful for selecting or describing what has to be preserved. This is a fundamental problem since curators and archivists would like to formally record the decisions of what has to be preserved over time and to decide (or verify) whether a migration (transformation) preserves the intended information content. Furthermore, it is important for reasoning about the authenticity of digital objects, as well as for reducing the cost of digital preservation.

preprint2011arXiv

Query processing in distributed, taxonomy-based information sources

We address the problem of answering queries over a distributed information system, storing objects indexed by terms organized in a taxonomy. The taxonomy consists of subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of terms. In the first part of the paper, we consider the centralized case, deriving a hypergraph-based algorithm that is efficient in data complexity. In the second part of the paper, we consider the distributed case, presenting alternative ways implementing the centralized algorithm. These ways descend from two basic criteria: direct vs. query re-writing evaluation, and centralized vs. distributed data or taxonomy allocation. Combinations of these criteria allow to cover a wide spectrum of architectures, ranging from client-server to peer-to-peer. We evaluate the performance of the various architectures by simulation on a network with O(10^4) nodes, and derive final results. An extensive review of the relevant literature is finally included.

preprint2011arXiv

Similarity-based Browsing over Linked Open Data

An increasing amount of data is published on the Web according to the Linked Open Data (LOD) principles. End users would like to browse these data in a flexible manner. In this paper we focus on similarity-based browsing and we introduce a novel method for computing the similarity between two entities of a given RDF/S graph. The distinctive characteristics of the proposed metric is that it is generic (it can be used to compare nodes of any kind), it takes into account the neighborhoods of the nodes, and it is configurable (with respect to the accuracy vs computational complexity tradeoff). We demonstrate the behavior of the metric using examples from an application over LOD. Finally, we generalize and elaborate on implementation approaches harmonized with the distributed nature of LOD which can be used for computing the most similar entities using neighborhood-based similarity metrics.

Yannis Tzitzikas

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

It's not the Language Model, it's the Tool: Deterministic Mediation for Scientific Workflows

CS563-QA: A Collection for Evaluating Question Answering Systems

Tasks that Require, or can Benefit from, Matching Blank Nodes

A Simple Method to Produce Algorithmic MIDI Music based on Randomness, Simple Probabilities and Multi-Threading

Information Carriers and Identification of Information Objects: An Ontological Approach

Query processing in distributed, taxonomy-based information sources

Similarity-based Browsing over Linked Open Data