Researcher profile

Georgios Paliouras

Georgios Paliouras contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Focused PU learning from imbalanced data

We propose a new method of learning from positive and unlabeled (PU) examples in highly imbalanced datasets. Many real-world problems, such as disease gene identification, targeted marketing, fraud detection, and recommender systems, are hard to address with machine learning methods, due to limited labeled data. Often, training data comprises positive and unlabeled instances, the latter typically being dominated by negative, but including also several positive instances. While PU learning is well-studied, few methods address imbalanced settings or hard-to-detect positive examples that resemble negative ones. Our approach uses a focused empirical risk estimator, incorporating both positive and unlabeled examples to train binary classifiers. Empirical evaluations demonstrate state-of-the-art performance on imbalanced datasets under two labeling mechanisms - selecting positives completely at random (SCAR) and selecting at random (SAR). Beyond these controlled experiments, we demonstrate the value of the proposed method in the real-world application of financial misstatement detection.

preprint2022arXiv

Learning Automata-Based Complex Event Patterns in Answer Set Programming

Complex Event Recognition and Forecasting (CER/F) techniques attempt to detect, or even forecast ahead of time, event occurrences in streaming input using predefined event patterns. Such patterns are not always known in advance, or they frequently change over time, making machine learning techniques, capable of extracting such patterns from data, highly desirable in CER/F. Since many CER/F systems use symbolic automata to represent such patterns, we propose a family of such automata where the transition-enabling conditions are defined by Answer Set Programming (ASP) rules, and which, thanks to the strong connections of ASP to symbolic learning, are directly learnable from data. We present such a learning approach in ASP and an incremental version thereof that trades optimality for efficiency and is capable to scale to large datasets. We evaluate our approach on two CER datasets and compare it to state-of-the-art automata learning techniques, demonstrating empirically a superior performance, both in terms of predictive accuracy and scalability.

preprint2022arXiv

Parallel Model Exploration for Tumor Treatment Simulations

Computational systems and methods are often being used in biological research, including the understanding of cancer and the development of treatments. Simulations of tumor growth and its response to different drugs are of particular importance, but also challenging complexity. The main challenges are first to calibrate the simulators so as to reproduce real-world cases, and second, to search for specific values of the parameter space concerning effective drug treatments. In this work, we combine a multi-scale simulator for tumor cell growth and a Genetic Algorithm (GA) as a heuristic search method for finding good parameter configurations in reasonable time. The two modules are integrated into a single workflow that can be executed in parallel on high performance computing infrastructures. In effect, the GA is used to calibrate the simulator, and then to explore different drug delivery schemes. Among these schemes, we aim to find those that minimize tumor cell size and the probability of emergence of drug resistant cells in the future. Experimental results illustrate the effectiveness and computational efficiency of the approach.

preprint2022arXiv

Predicting Intervention Approval in Clinical Trials through Multi-Document Summarization

Clinical trials offer a fundamental opportunity to discover new treatments and advance the medical knowledge. However, the uncertainty of the outcome of a trial can lead to unforeseen costs and setbacks. In this study, we propose a new method to predict the effectiveness of an intervention in a clinical trial. Our method relies on generating an informative summary from multiple documents available in the literature about the intervention under study. Specifically, our method first gathers all the abstracts of PubMed articles related to the intervention. Then, an evidence sentence, which conveys information about the effectiveness of the intervention, is extracted automatically from each abstract. Based on the set of evidence sentences extracted from the abstracts, a short summary about the intervention is constructed. Finally, the produced summaries are used to train a BERT-based classifier, in order to infer the effectiveness of an intervention. To evaluate our proposed method, we introduce a new dataset which is a collection of clinical trials together with their associated PubMed articles. Our experiments, demonstrate the effectiveness of producing short informative summaries and using them to predict the effectiveness of an intervention.

preprint2020arXiv

Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision

In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer's Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time.

preprint2020arXiv

iASiS Open Data Graph: Automated Semantic Integration of Disease-Specific Knowledge

In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need to compile their own datasets, integrating knowledge from different resources, or reuse existing datasets, that can be out-of-date. In this study, we propose a framework to automatically retrieve and integrate disease-specific knowledge into an up-to-date semantic graph, the iASiS Open Data Graph. This disease-specific semantic graph provides access to knowledge relevant to specific concepts and their individual aspects, in the form of concept relations and attributes. The proposed approach is implemented as an open-source framework and applied to three diseases (Lung Cancer, Dementia, and Duchenne Muscular Dystrophy). Exemplary queries are presented, investigating the potential of this automatically generated semantic graph as a basis for retrieval and analysis of disease-specific knowledge.

preprint2020arXiv

Results of the seventh edition of the BioASQ Challenge

The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.

preprint2013arXiv

A Probabilistic Logic Programming Event Calculus

We present a system for recognising human activity given a symbolic representation of video content. The input of our system is a set of time-stamped short-term activities (STA) detected on video frames. The output is a set of recognised long-term activities (LTA), which are pre-defined temporal combinations of STA. The constraints on the STA that, if satisfied, lead to the recognition of a LTA, have been expressed using a dialect of the Event Calculus. In order to handle the uncertainty that naturally occurs in human activity recognition, we adapted this dialect to a state-of-the-art probabilistic logic programming framework. We present a detailed evaluation and comparison of the crisp and probabilistic approaches through experimentation on a benchmark dataset of human surveillance videos.