Source author record

Martha Larson

Martha Larson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computer Vision Multimedia Computation and Language Artificial Intelligence cs.CY Human-Computer Interaction Machine Learning Social and Information Networks

Catalog footprint

What is connected

18works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

The computational cost of training a vision-language model (VLM) can be reduced by sampling the training data. Previous work on efficient VLM pre-training has pointed to the importance of semantic data balance, adjusting the distribution of topics in the data to improve VLM accuracy. However, existing efficient pre-training approaches may disproportionately remove rare concepts from the training corpus. As a result, \emph{long-tail concepts} remain insufficiently represented in the training data and are not effectively captured during training. In this work, we introduce a \emph{dynamic cluster-based sampling approach (DynamiCS)} that downsamples large clusters of data and upsamples small ones. The approach is dynamic in that it applies sampling at each epoch. We first show the importance of dynamic sampling for VLM training. Then, we demonstrate the advantage of our cluster-scaling approach, which maintains the relative order of semantic clusters in the data and emphasizes the long-tail. This approach contrasts with current work, which focuses only on flattening the semantic distribution of the data. Our experiments show that DynamiCS reduces the computational cost of VLM training and provides a performance advantage for long-tail concepts.

preprint2026arXiv

Frequency Is What You Need: Considering Word Frequency When Text Masking Benefits Vision-Language Model Pre-training

Vision Language Models (VLMs) can be trained more efficiently if training sets can be reduced in size. Recent work has shown the benefits of masking text during VLM training using a variety of strategies (truncation, random masking, block masking and syntax masking) and has reported syntax masking as the top performer. In this paper, we analyze the impact of different text masking strategies on the word frequency in the training data, and show that this impact is connected to model success. This finding motivates Contrastive Language-Image Pre-training with Word Frequency Masking (CLIPF), our proposed masking approach, which directly leverages word frequency. Extensive experiments demonstrate the advantages of CLIPF over syntax masking and other existing approaches, particularly when the number of input tokens decreases. We show that not only CLIPF, but also other existing masking strategies, outperform syntax masking when enough epochs are used during training, a finding of practical importance for selecting a text masking method for VLM training. Our code is available online.

preprint2026arXiv

Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models

When the visual style of text is considered, a wide variety can be observed in font, color, and size. However, when a word is read, its meaning is independent of the style in which it has been written or rendered. In this paper, we investigate whether, and how, the style in which a word is visualized in an image impacts the description that a Large Visual Language Model (LVLM) provides for the concept to which that word refers. Specifically, we investigate how functional text styles (readability-oriented, e.g., black sans-serif) versus decorative styles (display-oriented, e.g., colored cursive/script) affect LVLMs' descriptions of a concept in terms of the attributes of that concept. Our experiments study the situation in which the LVLM is able to correctly identify the concept referred to by a visual text, i.e., by a word or words rendered as an image, and in which the visual text style should not influence the attribute-based description that the LVLM produces. Our experimental results reveal that even when the concept is correctly identified, text style influences the model's attribute-based descriptions of the concept. Our findings demonstrate non-trivial style leakage from text style into semantic inference and motivate style-aware evaluation and mitigation for LVLM-based multimedia systems.

preprint2026arXiv

Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology

Speech and language are valuable for interacting with technology. It would be ideal to be able to decouple their use from anthropomorphization, which has recently met an important moment of reckoning. In the world of folktales, language is everywhere and talking to extraordinary objects is not unusual. This overview presents examples of the analogies that folktales offer. Extraordinary objects in folktales are diverse and also memorable. Language capacity and intelligence are not always connected to humanness. Consideration of folktales can offer inspiration and insight for using speech and language for interacting with technology.

preprint2022arXiv

Gender In Gender Out: A Closer Look at User Attributes in Context-Aware Recommendation

This paper studies user attributes in light of current concerns in the recommender system community: diversity, coverage, calibration, and data minimization. In experiments with a conventional context-aware recommender system that leverages side information, we show that user attributes do not always improve recommendation. Then, we demonstrate that user attributes can negatively impact diversity and coverage. Finally, we investigate the amount of information about users that ``survives'' from the training data into the recommendation lists produced by the recommender. This information is a weak signal that could in the future be exploited for calibration or studied further as a privacy leak.

preprint2022arXiv

Minimizing Mindless Mentions: Recommendation with Minimal Necessary User Reviews

Recently, researchers have turned their attention to recommender systems that use only minimal necessary data. This trend is informed by the idea that recommender systems should use no more user interactions than are needed in order to provide users with useful recommendations. In this position paper, we make the case for applying the idea of minimal necessary data to recommender systems that use user reviews. We argue that the content of individual user reviews should be subject to minimization. Specifically, reviews used as training data to generate recommendations or reviews used to help users decide on purchases or consumption should be automatically edited to contain only the information that is needed.

preprint2022arXiv

Regex in a Time of Deep Learning: The Role of an Old Technology in Age Discrimination Detection in Job Advertisements

Deep learning holds great promise for detecting discriminatory language in the public sphere. However, for the detection of illegal age discrimination in job advertisements, regex approaches are still strong performers. In this paper, we investigate job advertisements in the Netherlands. We present a qualitative analysis of the benefits of the 'old' approach based on regexes and investigate how neural embeddings could address its limitations.

preprint2020arXiv

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

We introduce an approach that enhances images using a color filter in order to create adversarial effects, which fool neural networks into misclassification. Our approach, Adversarial Color Enhancement (ACE), generates unrestricted adversarial images by optimizing the color filter via gradient descent. The novelty of ACE is its incorporation of established practice for image enhancement in a transparent manner. Experimental results validate the white-box adversarial strength and black-box transferability of ACE. A range of examples demonstrates the perceptual quality of images that ACE produces. ACE makes an important contribution to recent work that moves beyond $L_p$ imperceptibility and focuses on unrestricted adversarial modifications that yield large perceptible perturbations, but remain non-suspicious, to the human eye. The future potential of filter-based adversaries is also explored in two directions: guiding ACE with common enhancement practices (e.g., Instagram filters) towards specific attractive image styles and adapting ACE to image semantics. Code is available at https://github.com/ZhengyuZhao/ACE.

preprint2020arXiv

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data.

preprint2020arXiv

Remembering Winter Was Coming: Character-Oriented Video Summaries of TV Series

Today's popular TV series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive use of summaries to re-engage with the plot. Automatic generation of video summaries of TV series' complex stories requires, first, modeling the dynamics of the plot and, second, extracting relevant sequences. In this paper, we tackle plot modeling by considering the social network of interactions between the characters involved in the narrative: substantial, durable changes in a major character's social environment suggest a new development relevant for the summary. Once identified, these major stages in each character's storyline can be used as a basis for completing the summary with related sequences. Our algorithm combines such social network analysis with filmmaking grammar to automatically generate character-oriented video summaries of TV series from partially annotated data. We carry out evaluation with a user study in a real-world scenario: a large sample of viewers were asked to rank video summaries centered on five characters of the popular TV series Game of Thrones, a few weeks before the new, sixth season was released. Our results reveal the ability of character-oriented summaries to re-engage viewers in television series and confirm the contributions of modeling the plot content and exploiting stylistic patterns to identify salient sequences.

preprint2020arXiv

Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance

The success of image perturbations that are designed to fool image classifier is assessed in terms of both adversarial effect and visual imperceptibility. The conventional assumption on imperceptibility is that perturbations should strive for tight $L_p$-norm bounds in RGB space. In this work, we drop this assumption by pursuing an approach that exploits human color perception, and more specifically, minimizing perturbation size with respect to perceptual color distance. Our first approach, Perceptual Color distance C&W (PerC-C&W), extends the widely-used C&W approach and produces larger RGB perturbations. PerC-C&W is able to maintain adversarial strength, while contributing to imperceptibility. Our second approach, Perceptual Color distance Alternating Loss (PerC-AL), achieves the same outcome, but does so more efficiently by alternating between the classification loss and perceptual color difference when updating perturbations. Experimental evaluation shows PerC approaches outperform conventional $L_p$ approaches in terms of robustness and transferability, and also demonstrates that the PerC distance can provide added value on top of existing structure-based methods to creating image perturbations.

preprint2016arXiv

Exploring Deep Space: Learning Personalized Ranking in a Semantic Space

Recommender systems leverage both content and user interactions to generate recommendations that fit users' preferences. The recent surge of interest in deep learning presents new opportunities for exploiting these two sources of information. To recommend items we propose to first learn a user-independent high-dimensional semantic space in which items are positioned according to their substitutability, and then learn a user-specific transformation function to transform this space into a ranking according to the user's past preferences. An advantage of the proposed architecture is that it can be used to effectively recommend items using either content that describes the items or user-item ratings. We show that this approach significantly outperforms state-of-the-art recommender systems on the MovieLens 1M dataset.

preprint2016arXiv

Learning Subclass Representations for Visually-varied Image Classification

In this paper, we present a subclass-representation approach that predicts the probability of a social image belonging to one particular class. We explore the co-occurrence of user-contributed tags to find subclasses with a strong connection to the top level class. We then project each image on to the resulting subclass space to generate a subclass representation for the image. The novelty of the approach is that subclass representations make use of not only the content of the photos themselves, but also information on the co-occurrence of their tags, which determines membership in both subclasses and top-level classes. The novelty is also that the images are classified into smaller classes, which have a chance of being more visually stable and easier to model. These subclasses are used as a latent space and images are represented in this space by their probability of relatedness to all of the subclasses. In contrast to approaches directly modeling each top-level class based on the image content, the proposed method can exploit more information for visually diverse classes. The approach is evaluated on a set of $2$ million photos with 10 classes, released by the Multimedia 2013 Yahoo! Large-scale Flickr-tag Image Classification Grand Challenge. Experiments show that the proposed system delivers sound performance for visually diverse classes compared with methods that directly model top classes.

preprint2016arXiv

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

Today's geo-location estimation approaches are able to infer the location of a target image using its visual content alone. These approaches exploit visual matching techniques, applied to a large collection of background images with known geo-locations. Users who are unaware that visual retrieval approaches can compromise their geo-privacy, unwittingly open themselves to risks of crime or other unintended consequences. Private photo sharing is not able to protect users effectively, since its inconvenience is a barrier to consistent use, and photos can still fall into the wrong hands if they are re-shared. This paper lays the groundwork for a new approach to geo-privacy of social images: Instead of requiring a complete change of user behavior, we investigate the protection potential latent in users existing practices. We carry out a series of retrieval experiments using a large collection of social images (8.5M) to systematically analyze where users should be wary, and how both photo taking and editing practices impact the performance of geo-location estimation. We find that practices that are currently widespread are already sufficient to protect single-handedly the geo-location ('geo-cloak') up to more than 50% of images whose location would otherwise be automatically predictable. Our conclusion is that protecting users against the unwanted effects of visual retrieval is a viable research field, and should take as its starting point existing user practices.

preprint2014arXiv

A Crowdsourcing Procedure for the Discovery of Non-Obvious Attributes of Social Image

Research on mid-level image representations has conventionally concentrated relatively obvious attributes and overlooked non-obvious attributes, i.e., characteristics that are not readily observable when images are viewed independently of their context or function. Non-obvious attributes are not necessarily easily nameable, but nonetheless they play a systematic role in people`s interpretation of images. Clusters of related non-obvious attributes, called interpretation dimensions, emerge when people are asked to compare images, and provide important insight on aspects of social images that are considered relevant. In contrast to aesthetic or affective approaches to image analysis, non-obvious attributes are not related to the personal perspective of the viewer. Instead, they encode a conventional understanding of the world, which is tacit, rather than explicitly expressed. This paper introduces a procedure for discovering non-obvious attributes using crowdsourcing. We discuss this procedure using a concrete example of a crowdsourcing task on Amazon Mechanical Turk carried out in the domain of fashion. An analysis comparing discovered non-obvious attributes with user tags demonstrated the added value delivered by our procedure.

preprint2014arXiv

Corpus Development for Affective Video Indexing

Affective video indexing is the area of research that develops techniques to automatically generate descriptions of video content that encode the emotional reactions which the video content evokes in viewers. This paper provides a set of corpus development guidelines based on state-of-the-art practice intended to support researchers in this field. Affective descriptions can be used for video search and browsing systems offering users affective perspectives. The paper is motivated by the observation that affective video indexing has yet to fully profit from the standard corpora (data sets) that have benefited conventional forms of video indexing. Affective video indexing faces unique challenges, since viewer-reported affective reactions are difficult to assess. Moreover affect assessment efforts must be carefully designed in order to both cover the types of affective responses that video content evokes in viewers and also capture the stable and consistent aspects of these responses. We first present background information on affect and multimedia and related work on affective multimedia indexing, including existing corpora. Three dimensions emerge as critical for affective video corpora, and form the basis for our proposed guidelines: the context of viewer response, personal variation among viewers, and the effectiveness and efficiency of corpus creation. Finally, we present examples of three recent corpora and discuss how these corpora make progressive steps towards fulfilling the guidelines.

preprint2013arXiv

Exploiting Social Tags for Cross-Domain Collaborative Filtering

One of the most challenging problems in recommender systems based on the collaborative filtering (CF) concept is data sparseness, i.e., limited user preference data is available for making recommendations. Cross-domain collaborative filtering (CDCF) has been studied as an effective mechanism to alleviate data sparseness of one domain using the knowledge about user preferences from other domains. A key question to be answered in the context of CDCF is what common characteristics can be deployed to link different domains for effective knowledge transfer. In this paper, we assess the usefulness of user-contributed (social) tags in this respect. We do so by means of the Generalized Tag-induced Cross-domain Collaborative Filtering (GTagCDCF) approach that we propose in this paper and that we developed based on the general collective matrix factorization framework. Assessment is done by a series of experiments, using publicly available CF datasets that represent three cross-domain cases, i.e., two two-domain cases and one three-domain case. A comparative analysis on two-domain cases involving GTagCDCF and several state-of-the-art CDCF approaches indicates the increased benefit of using social tags as representatives of explicit links between domains for CDCF as compared to the implicit links deployed by the existing CDCF methods. In addition, we show that users from different domains can already benefit from GTagCDCF if they only share a few common tags. Finally, we use the three-domain case to validate the robustness of GTagCDCF with respect to the scale of datasets and the varying number of domains.

preprint2013arXiv

GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains

Recommender systems are frequently used in domains in which users express their preferences in the form of graded judgments, such as ratings. If accurate top-N recommendation lists are to be produced for such graded relevance domains, it is critical to generate a ranked list of recommended items directly rather than predicting ratings. Current techniques choose one of two sub-optimal approaches: either they optimize for a binary metric such as Average Precision, which discards information on relevance grades, or they optimize for Normalized Discounted Cumulative Gain (NDCG), which ignores the dependence of an item's contribution on the relevance of more highly ranked items. In this paper, we address the shortcomings of existing approaches by proposing the Graded Average Precision factor model (GAPfm), a latent factor model that is particularly suited to the problem of top-N recommendation in domains with graded relevance data. The model optimizes for Graded Average Precision, a metric that has been proposed recently for assessing the quality of ranked results list for graded relevance. GAPfm learns a latent factor model by directly optimizing a smoothed approximation of GAP. GAPfm's advantages are twofold: it maintains full information about graded relevance and also addresses the limitations of models that optimize NDCG. Experimental results show that GAPfm achieves substantial improvements on the top-N recommendation task, compared to several state-of-the-art approaches. In order to ensure that GAPfm is able to scale to very large data sets, we propose a fast learning algorithm that uses an adaptive item selection strategy. A final experiment shows that GAPfm is useful not only for generating recommendation lists, but also for ranking a given list of rated items.

Martha Larson

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Frequency Is What You Need: Considering Word Frequency When Text Masking Benefits Vision-Language Model Pre-training

Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models

Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology

Gender In Gender Out: A Closer Look at User Attributes in Context-Aware Recommendation

Minimizing Mindless Mentions: Recommendation with Minimal Necessary User Reviews

Regex in a Time of Deep Learning: The Role of an Old Technology in Age Discrimination Detection in Job Advertisements

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

Remembering Winter Was Coming: Character-Oriented Video Summaries of TV Series

Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance

Exploring Deep Space: Learning Personalized Ranking in a Semantic Space

Learning Subclass Representations for Visually-varied Image Classification

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

A Crowdsourcing Procedure for the Discovery of Non-Obvious Attributes of Social Image

Corpus Development for Affective Video Indexing

Exploiting Social Tags for Cross-Domain Collaborative Filtering

GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains