Source author record

Tomasz Kajdanowicz

Tomasz Kajdanowicz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks Information Retrieval physics.soc-ph Computation and Language Artificial Intelligence Computational Complexity Cryptography and Security Databases Distributed, Parallel, and Cluster Computing Human-Computer Interaction Performance

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models

Starting from the hypothesis that knowledge in semantic space is organized along structured manifolds, we argue that this geometric structure renders the space explorable. By traversing it and using the resulting continuous representations to condition an LLM's generation distribution, we can systematically expand the model's reachable semantic range. We introduce a framework that requires no modification of LLM parameters and operationalizes this idea by constructing a conditioning distribution from a small set of diverse anchor generations. This distribution conditions LLM's generation via an xRAG-style projector. Our experiments demonstrate that this manifold-based conditioning substantially increases generative diversity, with direct benefits for enhancing divergent thinking, a core facet of creativity, in language models.

preprint2022arXiv

Assessment of Massively Multilingual Sentiment Classifiers

Models are increasing in size and complexity in the hunt for SOTA. But what if those 2\% increase in performance does not make a difference in a production use case? Maybe benefits from a smaller, faster model outweigh those slight performance gains. Also, equally good performance across languages in multilingual tasks is more important than SOTA results on a single one. We present the biggest, unified, multilingual collection of sentiment analysis datasets. We use these to assess 11 models and 80 high-quality sentiment datasets (out of 342 raw datasets collected) in 27 languages and included results on the internally annotated datasets. We deeply evaluate multiple setups, including fine-tuning transformer-based models for measuring performance. We compare results in numerous dimensions addressing the imbalance in both languages coverage and dataset sizes. Finally, we present some best practices for working with such a massive collection of datasets and models from a multilingual perspective.

preprint2022arXiv

Dynamic pricing and discounts by means of interactive presentation systems in stationary point of sales

The main purpose of this article was to create a model and simulate the profitability conditions of an interactive presentation system (IPS) with the recommender system (RS) used in the kiosk. 90 million simulations have been run in Python with SymPy to address the problem of discount recommendation offered to the clients according to their usage of the IPS.

preprint2020arXiv

AttrE2vec: Unsupervised Attributed Edge Representation Learning

Representation learning has overcome the often arduous and manual featurization of networks through (unsupervised) feature learning as it results in embeddings that can apply to a variety of downstream learning tasks. The focus of representation learning on graphs has focused mainly on shallow (node-centric) or deep (graph-based) learning approaches. While there have been approaches that work on homogeneous and heterogeneous networks with multi-typed nodes and edges, there is a gap in learning edge representations. This paper proposes a novel unsupervised inductive method called AttrE2Vec, which learns a low-dimensional vector representation for edges in attributed networks. It systematically captures the topological proximity, attributes affinity, and feature similarity of edges. Contrary to current advances in edge embedding research, our proposal extends the body of methods providing representations for edges, capturing graph attributes in an inductive and unsupervised manner. Experimental results show that, compared to contemporary approaches, our method builds more powerful edge vector representations, reflected by higher quality measures (AUC, accuracy) in downstream tasks as edge classification and edge clustering. It is also confirmed by analyzing low-dimensional embedding projections.

preprint2020arXiv

Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections

Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.

preprint2016arXiv

How is a data-driven approach better than random choice in label space division for multi-label classification?

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss.

preprint2015arXiv

Learning in Unlabeled Networks - An Active Learning and Inference Approach

The task of determining labels of all network nodes based on the knowledge about network structure and labels of some training subset of nodes is called the within-network classification. It may happen that none of the labels of the nodes is known and additionally there is no information about number of classes to which nodes can be assigned. In such a case a subset of nodes has to be selected for initial label acquisition. The question that arises is: "labels of which nodes should be collected and used for learning in order to provide the best classification accuracy for the whole network?". Active learning and inference is a practical framework to study this problem. A set of methods for active learning and inference for within network classification is proposed and validated. The utility score calculation for each node based on network structure is the first step in the process. The scores enable to rank the nodes. Based on the ranking, a set of nodes, for which the labels are acquired, is selected (e.g. by taking top or bottom N from the ranking). The new measure-neighbour methods proposed in the paper suggest not obtaining labels of nodes from the ranking but rather acquiring labels of their neighbours. The paper examines 29 distinct formulations of utility score and selection methods reporting their impact on the results of two collective classification algorithms: Iterative Classification Algorithm and Loopy Belief Propagation. We advocate that the accuracy of presented methods depends on the structural properties of the examined network. We claim that measure-neighbour methods will work better than the regular methods for networks with higher clustering coefficient and worse than regular methods for networks with low clustering coefficient. According to our hypothesis, based on clustering coefficient we are able to recommend appropriate active learning and inference method.

preprint2014arXiv

Seed Selection for Spread of Influence in Social Networks: Temporal vs. Static Approach

The problem of finding optimal set of users for influencing others in the social network has been widely studied. Because it is NP-hard, some heuristics were proposed to find sub-optimal solutions. Still, one of the commonly used assumption is the one that seeds are chosen on the static network, not the dynamic one. This static approach is in fact far from the real-world networks, where new nodes may appear and old ones dynamically disappear in course of time. The main purpose of this paper is to analyse how the results of one of the typical models for spread of influence - linear threshold - differ depending on the strategy of building the social network used later for choosing seeds. To show the impact of network creation strategy on the final number of influenced nodes - outcome of spread of influence, the results for three approaches were studied: one static and two temporal with different granularities, i.e. various number of time windows. Social networks for each time window encapsulated dynamic changes in the network structure. Calculation of various node structural measures like degree or betweenness respected these changes by means of forgetting mechanism - more recent data had greater influence on node measure values. These measures were, in turn, used for node ranking and their selection for seeding. All concepts were applied to experimental verification on five real datasets. The results revealed that temporal approach is always better than static and the higher granularity in the temporal social network while seeding, the more finally influenced nodes. Additionally, outdegree measure with exponential forgetting typically outperformed other time-dependent structural measures, if used for seed candidate ranking.

preprint2013arXiv

Label-dependent Feature Extraction in Social Networks for Node Classification

A new method of feature extraction in the social network for within-network classification is proposed in the paper. The method provides new features calculated by combination of both: network structure information and class labels assigned to nodes. The influence of various features on classification performance has also been studied. The experiments on real-world data have shown that features created owing to the proposed method can lead to significant improvement of classification accuracy.

preprint2013arXiv

Multidimensional Social Network in the Social Recommender System

All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and object-based ones which respectively correspond to either social or semantic links between individuals. For better understanding of the complexity of the social network structure, layers and their profiles were identified and studied on two, spanned in time, snapshots of the Flickr population. Additionally, for each layer, a separate strength measure was proposed. The experiments on the Flickr photo sharing system revealed that the relationships between users result either from semantic links between objects they operate on or from social connections of these users. Moreover, the density of the social network increases in time. The second part of the study is devoted to building a social recommender system that supports the creation of new relations between users in a multimedia sharing system. Its main goal is to generate personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer in the multidimensional social network. The conducted experiments confirmed the usefulness of the proposed model.

preprint2013arXiv

Parallel Processing of Large Graphs

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.

preprint2013arXiv

Privacy-preserving Data Mining, Sharing and Publishing

The goal of the paper is to present different approaches to privacy-preserving data sharing and publishing in the context of e-health care systems. In particular, the literature review on technical issues in privacy assurance and current real-life high complexity implementation of medical system that assumes proper data sharing mechanisms are presented in the paper.

preprint2013arXiv

Social Recommendations within the Multimedia Sharing Systems

The social recommender system that supports the creation of new relations between users in the multimedia sharing system is presented in the paper. To generate suggestions the new concept of the multirelational social network was introduced. It covers both direct as well as object-based relationships that reflect social and semantic links between users. The main goal of the new method is to create the personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer from the social network. The conducted experiments confirmed the usefulness of the proposed model.

Tomasz Kajdanowicz

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models

Assessment of Massively Multilingual Sentiment Classifiers

Dynamic pricing and discounts by means of interactive presentation systems in stationary point of sales

AttrE2vec: Unsupervised Attributed Edge Representation Learning

Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections

How is a data-driven approach better than random choice in label space division for multi-label classification?

Learning in Unlabeled Networks - An Active Learning and Inference Approach

Seed Selection for Spread of Influence in Social Networks: Temporal vs. Static Approach

Label-dependent Feature Extraction in Social Networks for Node Classification

Multidimensional Social Network in the Social Recommender System

Parallel Processing of Large Graphs

Privacy-preserving Data Mining, Sharing and Publishing

Social Recommendations within the Multimedia Sharing Systems