Source author record

Jakob Grue Simonsen

Jakob Grue Simonsen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computation and Language Machine Learning Artificial Intelligence Computational Complexity Logic in Computer Science Programming Languages Computer Science and Game Theory Data Structures and Algorithms Distributed, Parallel, and Cluster Computing econ.TH Human-Computer Interaction

Catalog footprint

What is connected

18works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs

Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled probabilistic framing based on the outcome space resulting from featured-derived partial orders on events. From this we derive an EKG discovery algorithm based on statistical inference rather than an ad hoc or heuristic-based strategy, or relying on manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the maximum likelihood term in our objective function involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We establish an upper bound on our objective function which we show to be antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.

preprint2022arXiv

Fact Checking with Insufficient Evidence

Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts, for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21% accuracy), whereas it is easiest for omitted date modifiers (63% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score.

preprint2020arXiv

A Complete Characterization of Infinitely Repeated Two-Player Games having Computable Strategies with no Computable Best Response under Limit-of-Means Payoff

It is well-known that for infinitely repeated games, there are computable strategies that have best responses, but no computable best responses. These results were originally proved for either specific games (e.g., Prisoner's dilemma), or for classes of games satisfying certain conditions not known to be both necessary and sufficient. We derive a complete characterization in the form of simple necessary and sufficient conditions for the existence of a computable strategy without a computable best response under limit-of-means payoff. We further refine the characterization by requiring the strategy profiles to be Nash equilibria or subgame-perfect equilibria, and we show how the characterizations entail that it is efficiently decidable whether an infinitely repeated game has a computable strategy without a computable best response.

preprint2020arXiv

Cons-free Programs and Complexity Classes between LOGSPACE and PTIME

Programming language concepts are used to give some new perspectives on a long-standing open problem: is logspace = ptime ?

preprint2020arXiv

Content-aware Neural Hashing for Cold-start Recommendation

Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user-item relevance. NeuHash-CF is modelled as an autoencoder architecture, consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing, the item hashing component generates a hash code directly from an item's content information (i.e., it generates cold-start and seen item hash codes in the same manner). This contrasts existing state-of-the-art models, which treat the two item cases separately. The user hash codes are generated directly based on user id, through learning a user embedding matrix. We show experimentally that NeuHash-CF significantly outperforms state-of-the-art baselines by up to 12\% NDCG and 13\% MRR in cold-start recommendation settings, and up to 4\% in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 2-4x shorter hash codes, while obtaining the same or better performance compared to the state of the art, thus consequently also enabling a notable storage reduction.

preprint2020arXiv

Encoding word order in complex embeddings

Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships. Our solution generalizes word embeddings, previously defined as independent vectors, to continuous word functions over a variable (position). The benefit of continuous functions over variable positions is that word representations shift smoothly with increasing positions. Hence, word representations in different positions can correlate with each other in a continuous function. The general solution of these functions is extended to complex-valued domain due to richer representations. We extend CNN, RNN and Transformer NNs to complex-valued versions to incorporate our complex embedding (we make all code available). Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. To our knowledge, this is the first work in NLP to link imaginary numbers in complex-valued representations to concrete meanings (i.e., word order).

preprint2020arXiv

Factuality Checking in News Headlines with Eye Tracking

We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that predicts news headline factuality using only eye-tracking measurements. Our model yields a mean AUC of 0.688 and is better at detecting false than true headlines. Through a model analysis, we find that eye-tracking 25 users when reading 3-6 headlines is sufficient for our ensemble learner.

preprint2020arXiv

Generating Fact Checking Explanations

Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims. This paper provides the first study of how these explanations can be generated automatically based on available claim context, and how this task can be modelled jointly with veracity prediction. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system. The results of a manual evaluation further suggest that the informativeness, coverage and overall quality of the generated explanations are also improved in the multi-task model.

preprint2020arXiv

Unsupervised Semantic Hashing with Pairwise Reconstruction

Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.

preprint2016arXiv

Adaptive Distributional Extensions to DFR Ranking

Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).

preprint2016arXiv

Complexity Hierarchies and Higher-Order Cons-Free Rewriting

Constructor rewriting systems are said to be cons-free if, roughly, constructor terms in the right-hand sides of rules are subterms of constructor terms in the left-hand side; the computational intuition is that rules cannot build new data structures. It is well-known that cons-free programming languages can be used to characterize computational complexity classes, and that cons-free first-order term rewriting can be used to characterize the set of polynomial-time decidable sets. We investigate cons-free higher-order term rewriting systems, the complexity classes they characterize, and how these depend on the order of the types used in the systems. We prove that, for every k $\geq$ 1, left-linear cons-free systems with type order k characterize E$^k$TIME if arbitrary evaluation is used (i.e., the system does not have a fixed reduction strategy). The main difference with prior work in implicit complexity is that (i) our results hold for non-orthogonal term rewriting systems with possible rule overlaps with no assumptions about reduction strategy, (ii) results for such term rewriting systems have previously only been obtained for k = 1, and with additional syntactic restrictions on top of cons-freeness and left-linearity. Our results are apparently among the first implicit characterizations of the hierarchy E = E$^1$TIME $\subseteq$ E$^2$TIME $\subseteq$ .... Our work confirms prior results that having full non-determinism (via overlaps of rules) does not directly allow characterization of non-deterministic complexity classes like NE. We also show that non-determinism makes the classes characterized highly sensitive to minor syntactic changes such as admitting product types or non-left-linear rules.

preprint2016arXiv

Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)

What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a first step towards answering this question. Given a query, we train a Recurrent Neural Network (RNN) on existing relevant information to that query. We then use the RNN to "deep learn" a single, synthetic, and we assume, relevant document for that query. We design a crowdsourcing experiment to assess how relevant the "deep learned" document is, compared to existing relevant documents. Users are shown a query and four wordclouds (of three existing relevant documents and our deep learned synthetic document). The synthetic document is ranked on average most relevant of all.

preprint2016arXiv

Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where adjusting the ranking of documents according to both their relevance and their coherence has been shown to increase retrieval effectiveness [34,37]. The state of the art in unsupervised coherence modelling represents documents as bipartite graphs of sentences and discourse entities, and then projects these bipartite graphs into one-mode undirected graphs. However, one-mode projections may incur significant loss of the information present in the original bipartite structure. To address this we present three novel graph metrics that compute document coherence on the original bipartite graph of sentences and entities. Evaluation on standard settings shows that: (i) one of our coherence metrics beats the state of the art in terms of coherence accuracy; and (ii) all three of our coherence metrics improve retrieval effectiveness because, as closer analysis reveals, they capture aspects of document quality that go undetected by both keyword-based standard ranking and by spam filtering. This work contributes document coherence metrics that are theoretically principled, parameter-free, and useful to IR.

preprint2015arXiv

Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56].

preprint2015arXiv

Near-optimal adjacency labeling scheme for power-law graphs

An adjacency labeling scheme is a method that assigns labels to the vertices of a graph such that adjacency between vertices can be inferred directly from the assigned label, without using a centralized data structure. We devise adjacency labeling schemes for the family of power-law graphs. This family that has been used to model many types of networks, e.g. the Internet AS-level graph. Furthermore, we prove an almost matching lower bound for this family. We also provide an asymptotically near- optimal labeling scheme for sparse graphs. Finally, we validate the efficiency of our labeling scheme by an experimental evaluation using both synthetic data and real-world networks of up to hundreds of thousands of vertices.

preprint2015arXiv

Non-Compositional Term Dependence for Information Retrieval

Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR.

preprint2010arXiv

Infinitary Combinatory Reduction Systems: Normalising Reduction Strategies

We study normalising reduction strategies for infinitary Combinatory Reduction Systems (iCRSs). We prove that all fair, outermost-fair, and needed-fair strategies are normalising for orthogonal, fully-extended iCRSs. These facts properly generalise a number of results on normalising strategies in first-order infinitary rewriting and provide the first examples of normalising strategies for infinitary lambda calculus.

preprint2009arXiv

Infinitary Combinatory Reduction Systems: Confluence

We study confluence in the setting of higher-order infinitary rewriting, in particular for infinitary Combinatory Reduction Systems (iCRSs). We prove that fully-extended, orthogonal iCRSs are confluent modulo identification of hypercollapsing subterms. As a corollary, we obtain that fully-extended, orthogonal iCRSs have the normal form property and the unique normal form property (with respect to reduction). We also show that, unlike the case in first-order infinitary rewriting, almost non-collapsing iCRSs are not necessarily confluent.

Jakob Grue Simonsen

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs

Fact Checking with Insufficient Evidence

A Complete Characterization of Infinitely Repeated Two-Player Games having Computable Strategies with no Computable Best Response under Limit-of-Means Payoff

Cons-free Programs and Complexity Classes between LOGSPACE and PTIME

Content-aware Neural Hashing for Cold-start Recommendation

Encoding word order in complex embeddings

Factuality Checking in News Headlines with Eye Tracking

Generating Fact Checking Explanations

Unsupervised Semantic Hashing with Pairwise Reconstruction

Adaptive Distributional Extensions to DFR Ranking

Complexity Hierarchies and Higher-Order Cons-Free Rewriting

Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)

Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

Near-optimal adjacency labeling scheme for power-law graphs

Non-Compositional Term Dependence for Information Retrieval

Infinitary Combinatory Reduction Systems: Normalising Reduction Strategies

Infinitary Combinatory Reduction Systems: Confluence