Source author record

Alan Hanjalic

Alan Hanjalic appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computer Vision Multimedia physics.soc-ph Artificial Intelligence Computation and Language cs.CY Human-Computer Interaction Machine Learning physics.data-an Social and Information Networks

Catalog footprint

What is connected

13works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Evaluating the Impact of Tiled User-Adaptive Real-Time Point Cloud Streaming on VR Remote Communication

Remote communication has rapidly become a part of everyday life in both professional and personal contexts. However, popular video conferencing applications present limitations in terms of quality of communication, immersion and social meaning. VR remote communication applications offer a greater sense of co-presence and mutual sensing of emotions between remote users. Previous research on these applications has shown that realistic point cloud user reconstructions offer better immersion and communication as compared to synthetic user avatars. However, photorealistic point clouds require a large volume of data per frame and are challenging to transmit over bandwidth-limited networks. Recent research has demonstrated significant improvements to perceived quality by optimizing the usage of bandwidth based on the position and orientation of the user's viewport with user-adaptive streaming. In this work, we developed a real-time VR communication application with an adaptation engine that features tiled user-adaptive streaming based on user behaviour. The application also supports traditional network adaptive streaming. The contribution of this work is to evaluate the impact of tiled user-adaptive streaming on quality of communication, visual quality, system performance and task completion in a functional live VR remote communication system. We perform a subjective evaluation with 33 users to compare the different streaming conditions with a neck exercise training task. As a baseline, we use uncompressed streaming requiring ca. 300Mbps and our solution achieves similar visual quality with tiled adaptive streaming at 14Mbps. We also demonstrate statistically significant gains to the quality of interaction and improvements to system performance and CPU consumption with tiled adaptive streaming as compared to the more traditional network adaptive streaming.

preprint2022arXiv

Topological-temporal properties of evolving networks

Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare such systems and to study the dynamical processes unfolding on them. A systematic method to characterize simultaneously the temporal and topological relations of active links (also called contacts or events), in order to compare different real-world networks and to detect their common patterns or differences is still missing. In this paper, we propose a method to characterize to what extent contacts that happen close in time occur also close in topology. Specifically, we study the interrelation between temporal and topological properties of contacts from three perspectives: (1) the autocorrelation of the time series recording the total number of contacts happened at each time step in a network; (2) the interplay between the topological distance and interevent time of two contacts; (3) the temporal correlation of contacts within local neighborhoods beyond a node pair. By applying our method on 13 real-world temporal networks, we found that temporal-topological correlation of contacts is more evident in virtual contact networks than in physical contact ones. This could be due to the lower cost and easier access of online communications than physical interactions, allowing and possibly facilitating social contagion, i.e., interactions of one individual may influence the activity of its neighbors. We also identify different patterns between virtual and physical networks and among physical contact networks at, e.g., school and workplace, in the formation of correlation in local neighborhoods. Detected patterns and differences may further inspire the development of more realistic temporal network models, that could reproduce jointly temporal and topological properties of contacts.

preprint2021arXiv

Leave No User Behind: Towards Improving the Utility of Recommender Systems for Non-mainstream Users

In a collaborative-filtering recommendation scenario, biases in the data will likely propagate in the learned recommendations. In this paper we focus on the so-called mainstream bias: the tendency of a recommender system to provide better recommendations to users who have a mainstream taste, as opposed to non-mainstream users. We propose NAECF, a conceptually simple but effective idea to address this bias. The idea consists of adding an autoencoder (AE) layer when learning user and item representations with text-based Convolutional Neural Networks. The AEs, one for the users and one for the items, serve as adversaries to the process of minimizing the rating prediction error when learning how to recommend. They enforce that the specific unique properties of all users and items are sufficiently well incorporated and preserved in the learned representations. These representations, extracted as the bottlenecks of the corresponding AEs, are expected to be less biased towards mainstream users, and to provide more balanced recommendation utility across all users. Our experimental results confirm these expectations, significantly improving the recommendations for non-mainstream users while maintaining the recommendation quality for mainstream users. Our results emphasize the importance of deploying extensive content-based features, such as online reviews, in order to better represent users and items to maximize the de-biasing effect.

preprint2020arXiv

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity. The implementation code is available at https://github.com/Wangt-CN/MTFN-RR-PyTorch-Code.

preprint2020arXiv

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data.

preprint2020arXiv

S2IGAN: Speech-to-Image Generation via Adversarial Learning

An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus allowing unwritten languages to potentially benefit from this technology. The proposed S2IG framework, named S2IGAN, consists of a speech embedding network (SEN) and a relation-supervised densely-stacked generative model (RDG). SEN learns the speech embedding with the supervision of the corresponding visual information. Conditioned on the speech embedding produced by SEN, the proposed RDG synthesizes images that are semantically consistent with the corresponding speech descriptions. Extensive experiments on two public benchmark datasets CUB and Oxford-102 demonstrate the effectiveness of the proposed S2IGAN on synthesizing high-quality and semantically-consistent images from the speech signal, yielding a good performance and a solid baseline for the S2IG task.

preprint2016arXiv

Geo-distinctive Visual Element Matching for Location Estimation of Images

We propose an image representation and matching approach that substantially improves visual-based location estimation for images. The main novelty of the approach, called distinctive visual element matching (DVEM), is its use of representations that are specific to the query image whose location is being predicted. These representations are based on visual element clouds, which robustly capture the connection between the query and visual evidence from candidate locations. We then maximize the influence of visual elements that are geo-distinctive because they do not occur in images taken at many other locations. We carry out experiments and analysis for both geo-constrained and geo-unconstrained location estimation cases using two large-scale, publicly-available datasets: the San Francisco Landmark dataset with $1.06$ million street-view images and the MediaEval '15 Placing Task dataset with $5.6$ million geo-tagged images from Flickr. We present examples that illustrate the highly-transparent mechanics of the approach, which are based on common sense observations about the visual patterns in image collections. Our results show that the proposed method delivers a considerable performance improvement compared to the state of the art.

preprint2016arXiv

Learning Subclass Representations for Visually-varied Image Classification

In this paper, we present a subclass-representation approach that predicts the probability of a social image belonging to one particular class. We explore the co-occurrence of user-contributed tags to find subclasses with a strong connection to the top level class. We then project each image on to the resulting subclass space to generate a subclass representation for the image. The novelty of the approach is that subclass representations make use of not only the content of the photos themselves, but also information on the co-occurrence of their tags, which determines membership in both subclasses and top-level classes. The novelty is also that the images are classified into smaller classes, which have a chance of being more visually stable and easier to model. These subclasses are used as a latent space and images are represented in this space by their probability of relatedness to all of the subclasses. In contrast to approaches directly modeling each top-level class based on the image content, the proposed method can exploit more information for visually diverse classes. The approach is evaluated on a set of $2$ million photos with 10 classes, released by the Multimedia 2013 Yahoo! Large-scale Flickr-tag Image Classification Grand Challenge. Experiments show that the proposed system delivers sound performance for visually diverse classes compared with methods that directly model top classes.

preprint2016arXiv

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

Today's geo-location estimation approaches are able to infer the location of a target image using its visual content alone. These approaches exploit visual matching techniques, applied to a large collection of background images with known geo-locations. Users who are unaware that visual retrieval approaches can compromise their geo-privacy, unwittingly open themselves to risks of crime or other unintended consequences. Private photo sharing is not able to protect users effectively, since its inconvenience is a barrier to consistent use, and photos can still fall into the wrong hands if they are re-shared. This paper lays the groundwork for a new approach to geo-privacy of social images: Instead of requiring a complete change of user behavior, we investigate the protection potential latent in users existing practices. We carry out a series of retrieval experiments using a large collection of social images (8.5M) to systematically analyze where users should be wary, and how both photo taking and editing practices impact the performance of geo-location estimation. We find that practices that are currently widespread are already sufficient to protect single-handedly the geo-location ('geo-cloak') up to more than 50% of images whose location would otherwise be automatically predictable. Our conclusion is that protecting users against the unwanted effects of visual retrieval is a viable research field, and should take as its starting point existing user practices.

preprint2014arXiv

Corpus Development for Affective Video Indexing

Affective video indexing is the area of research that develops techniques to automatically generate descriptions of video content that encode the emotional reactions which the video content evokes in viewers. This paper provides a set of corpus development guidelines based on state-of-the-art practice intended to support researchers in this field. Affective descriptions can be used for video search and browsing systems offering users affective perspectives. The paper is motivated by the observation that affective video indexing has yet to fully profit from the standard corpora (data sets) that have benefited conventional forms of video indexing. Affective video indexing faces unique challenges, since viewer-reported affective reactions are difficult to assess. Moreover affect assessment efforts must be carefully designed in order to both cover the types of affective responses that video content evokes in viewers and also capture the stable and consistent aspects of these responses. We first present background information on affect and multimedia and related work on affective multimedia indexing, including existing corpora. Three dimensions emerge as critical for affective video corpora, and form the basis for our proposed guidelines: the context of viewer response, personal variation among viewers, and the effectiveness and efficiency of corpus creation. Finally, we present examples of three recent corpora and discuss how these corpora make progressive steps towards fulfilling the guidelines.

preprint2014arXiv

Heterogeneous Recovery Rates against SIS Epidemics in Directed Networks

The nodes in communication networks are possibly and most likely equipped with different recovery resources, which allow them to recover from a virus with different rates. In this paper, we aim to understand know how to allocate the limited recovery resources to efficiently prevent the spreading of epidemics. We study the susceptible-infected-susceptible (SIS) epidemic model on directed scale-free networks. In the classic SIS model, a susceptible node can be infected by an infected neighbor with the infection rate $β$ and an infected node can be recovered to be susceptible again with the recovery rate $δ$. In the steady state a fraction $y_\infty$ of nodes are infected, which shows how severely the network is infected. We propose to allocate the recovery rate $δ_i$ for node $i$ according to its indegree and outdegree-$δ_i\scriptsize{\sim}k_{i,in}^{α_{in}}k_{i,out}^{α_{out}}$, given the finite average recovery rate $\langleδ\rangle$ representing the limited recovery resources over the whole network. We find that, by tuning the two scaling exponents $α_{in}$ and $α_{out}$, we can always reduce the infection fraction $y_\infty$ thus reducing the extent of infections, comparing to the homogeneous recovery rates allocation. Moreover, we can find our optimal strategy via the optimal choice of the exponent $α_{in}$ and $α_{out}$. Our optimal strategy indicates that when the recovery resources are sufficient, more resources should be allocated to the nodes with a larger indegree or outdegree, but when the recovery resource is very limited, only the nodes with a larger outdegree should be equipped with more resources. We also find that our optimal strategy works better when the recovery resources are sufficient but not yet able to make the epidemic die out, and when the indegree outdegree correlation is small.

preprint2013arXiv

Exploiting Social Tags for Cross-Domain Collaborative Filtering

One of the most challenging problems in recommender systems based on the collaborative filtering (CF) concept is data sparseness, i.e., limited user preference data is available for making recommendations. Cross-domain collaborative filtering (CDCF) has been studied as an effective mechanism to alleviate data sparseness of one domain using the knowledge about user preferences from other domains. A key question to be answered in the context of CDCF is what common characteristics can be deployed to link different domains for effective knowledge transfer. In this paper, we assess the usefulness of user-contributed (social) tags in this respect. We do so by means of the Generalized Tag-induced Cross-domain Collaborative Filtering (GTagCDCF) approach that we propose in this paper and that we developed based on the general collective matrix factorization framework. Assessment is done by a series of experiments, using publicly available CF datasets that represent three cross-domain cases, i.e., two two-domain cases and one three-domain case. A comparative analysis on two-domain cases involving GTagCDCF and several state-of-the-art CDCF approaches indicates the increased benefit of using social tags as representatives of explicit links between domains for CDCF as compared to the implicit links deployed by the existing CDCF methods. In addition, we show that users from different domains can already benefit from GTagCDCF if they only share a few common tags. Finally, we use the three-domain case to validate the robustness of GTagCDCF with respect to the scale of datasets and the varying number of domains.

preprint2013arXiv

GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains

Recommender systems are frequently used in domains in which users express their preferences in the form of graded judgments, such as ratings. If accurate top-N recommendation lists are to be produced for such graded relevance domains, it is critical to generate a ranked list of recommended items directly rather than predicting ratings. Current techniques choose one of two sub-optimal approaches: either they optimize for a binary metric such as Average Precision, which discards information on relevance grades, or they optimize for Normalized Discounted Cumulative Gain (NDCG), which ignores the dependence of an item's contribution on the relevance of more highly ranked items. In this paper, we address the shortcomings of existing approaches by proposing the Graded Average Precision factor model (GAPfm), a latent factor model that is particularly suited to the problem of top-N recommendation in domains with graded relevance data. The model optimizes for Graded Average Precision, a metric that has been proposed recently for assessing the quality of ranked results list for graded relevance. GAPfm learns a latent factor model by directly optimizing a smoothed approximation of GAP. GAPfm's advantages are twofold: it maintains full information about graded relevance and also addresses the limitations of models that optimize NDCG. Experimental results show that GAPfm achieves substantial improvements on the top-N recommendation task, compared to several state-of-the-art approaches. In order to ensure that GAPfm is able to scale to very large data sets, we propose a fast learning algorithm that uses an adaptive item selection strategy. A final experiment shows that GAPfm is useful not only for generating recommendation lists, but also for ranking a given list of rated items.

Alan Hanjalic

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Evaluating the Impact of Tiled User-Adaptive Real-Time Point Cloud Streaming on VR Remote Communication

Topological-temporal properties of evolving networks

Leave No User Behind: Towards Improving the Utility of Recommender Systems for Non-mainstream Users

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

S2IGAN: Speech-to-Image Generation via Adversarial Learning

Geo-distinctive Visual Element Matching for Location Estimation of Images

Learning Subclass Representations for Visually-varied Image Classification

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

Corpus Development for Affective Video Indexing

Heterogeneous Recovery Rates against SIS Epidemics in Directed Networks

Exploiting Social Tags for Cross-Domain Collaborative Filtering

GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains