Researcher profile

Michael D. Ekstrand

Michael D. Ekstrand contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

Much Ado About Gender: Current Practices and Future Recommendations for Appropriate Gender-Aware Information Access

Information access research (and development) sometimes makes use of gender, whether to report on the demographics of participants in a user study, as inputs to personalized results or recommendations, or to make systems gender-fair, amongst other purposes. This work makes a variety of assumptions about gender, however, that are not necessarily aligned with current understandings of what gender is, how it should be encoded, and how a gender variable should be ethically used. In this work, we present a systematic review of papers on information retrieval and recommender systems that mention gender in order to document how gender is currently being used in this field. We find that most papers mentioning gender do not use an explicit gender variable, but most of those that do either focus on contextualizing results of model performance, personalizing a system based on assumptions of user gender, or auditing a model's behavior for fairness or other privacy-related issues. Moreover, most of the papers we review rely on a binary notion of gender, even if they acknowledge that gender cannot be split into two categories. We connect these findings with scholarship on gender theory and recent work on gender in human-computer interaction and natural language processing. We conclude by making recommendations for ethical and well-grounded use of gender in building and researching information access systems.

preprint2022arXiv

Comparing Fair Ranking Metrics

Ranked lists are frequently used by information retrieval (IR) systems to present results believed to be relevant to the users information need. Fairness is a relatively new but important aspect of these rankings to measure, joining a rich set of metrics that go beyond traditional accuracy or utility constructs to provide a more holistic understanding of IR system behavior. In the last few years, several metrics have been proposed to quantify the (un)fairness of rankings, particularly with respect to particular group(s) of content providers, but comparative analyses of these metrics -- particularly for IR -- is lacking. There is limited guidance, therefore, to decide what fairness metrics are applicable to a specific scenario, or assessment of the extent to which metrics agree or disagree applied to real data. In this paper, we describe several fair ranking metrics from existing literature in a common notation, enabling direct comparison of their assumptions, goals, and design choices; we then empirically compare them on multiple data sets covering both search and recommendation tasks.

preprint2022arXiv

Fairness in Information Access Systems

Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-based problem setting, the centrality of personalization in many cases, and the role of user response complicate the problem of identifying precisely what types and operationalizations of fairness may be relevant, let alone measuring or promoting them. In this monograph, we present a taxonomy of the various dimensions of fair information access and survey the literature to date on this new and rapidly-growing topic. We preface this with brief introductions to information access and algorithmic fairness, to facilitate use of this work by scholars with experience in one (or neither) of these fields who wish to learn about their intersection. We conclude with several open problems in fair information access, along with some suggestions for how to approach research in this space.

preprint2022arXiv

Fire Dragon and Unicorn Princess; Gender Stereotypes and Children's Products in Search Engine Responses

Search engines in e-commerce settings allow users to search, browse, and select items from a wide range of products available online including children's items. Children's products such as toys, books, and learning materials often have stereotype-based gender associations. Both academic research and public campaigns are working to promote stereotype-free childhood development. However, to date, e-commerce search engines have not received as much attention as physical stores, product design, or marketing as a potential channel of gender stereotypes. To fill this gap, in this paper, we study the manifestations of gender stereotypes in e-commerce sites when responding to queries related to children's products by exploring query suggestions and search results. We have three primary contributions. First, we provide an aggregated list of children's products with associated gender stereotypes from the existing body of research. Second, we provide preliminary methods for identifying and quantifying gender stereotypes in system's responses. Third, to show the importance of attending this problem, we identify the existence of gender stereotypes in query suggestions and search results across multiple e-commerce sites.

preprint2022arXiv

Matching Consumer Fairness Objectives & Strategies for RecSys

The last several years have brought a growing body of work on ensuring that recommender systems are in some sense consumer-fair -- that is, they provide comparable quality of service, accuracy of representation, and other effects to their users. However, there are many different strategies to make systems more fair and a range of intervention points. In this position paper, we build on ongoing work to highlight the need for researchers and practitioners to attend to the details of their application, users, and the fairness objective they aim to achieve, and adopt interventions that are appropriate to the situation. We argue that consumer fairness should be a creative endeavor flowing from the particularities of the specific problem to be solved.

preprint2020arXiv

Estimating Error and Bias in Offline Evaluation Results

Offline evaluations of recommender systems attempt to estimate users' satisfaction with recommendations using static data from prior user interactions. These evaluations provide researchers and developers with first approximations of the likely performance of a new system and help weed out bad ideas before presenting them to users. However, offline evaluation cannot accurately assess novel, relevant recommendations, because the most novel items were previously unknown to the user, so they are missing from the historical data and cannot be judged as relevant. We present a simulation study to estimate the error that such missing data causes in commonly-used evaluation metrics in order to assess its prevalence and impact. We find that missing data in the rating or observation process causes the evaluation protocol to systematically mis-estimate metric values, and in some cases erroneously determine that a popularity-based recommender outperforms even a perfect personalized recommender. Substantial breakthroughs in recommendation quality, therefore, will be difficult to assess with existing offline techniques.

preprint2020arXiv

Exploring Author Gender in Book Rating and Recommendation

Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of the patterns in rating datasets reflect important real-world differences between the various users and items in the data; other patterns may be irrelevant or possibly undesirable for social or ethical reasons, particularly if they reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to a dimension of social concern, namely content creator gender. Using publicly-available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms differ in the gender distribution of their recommendation lists, and in the relationship of that output distribution to user profile distribution.

preprint2020arXiv

LensKit for Python: Next-Generation Software for Recommender System Experiments

LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, TensorFlow, and PyTorch. To that end, it provides classical collaborative filtering implementations, recommender system evaluation metrics, data preparation routines, and tools for efficiently batch running recommendation algorithms, all usable in any combination with each other or with other Python software. This paper describes the design goals, use cases, and capabilities of LKPY, contextualized in a reflection on the successes and failures of the original LensKit for Java software.

preprint2020arXiv

Overview of the TREC 2019 Fair Ranking Track

The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts given a query. The objective was to fairly represent relevant authors from several groups that were unknown at the system submission time. Thus, the track emphasized the development of systems which have robust performance across a variety of group definitions. Participants were provided with querylog data (queries, documents, and relevance) from Semantic Scholar. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process, as well as a comparison of the performance of submitted systems.