Researcher profile

Ryoma Sato

Ryoma Sato contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2022arXiv

Approximating 1-Wasserstein Distance with Trees

Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach to learning the weights of the edges in a tree. To this end, we first show that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and can be formulated as a Lasso-based regression problem. Owing to the convex formulation, we can obtain a globally optimal solution efficiently. Moreover, we propose a tree-sliced variant of these methods. Through experiments, we demonstrated that the weighted TWD can accurately approximate the original 1-Wasserstein distance.

preprint2022arXiv

CLEAR: A Fully User-side Image Search System

We use many search engines on the Internet in our daily lives. However, they are not perfect. Their scoring function may not model our intent or they may accept only text queries even though we want to carry out a similar image search. In such cases, we need to make a compromise: We continue to use the unsatisfactory service or leave the service. Recently, a new solution, user-side search systems, has been proposed. In this framework, each user builds their own search system that meets their preference with a user-defined scoring function and user-defined interface. Although the concept is appealing, it is still not clear if this approach is feasible in practice. In this demonstration, we show the first fully user-side image search system, CLEAR, which realizes a similar-image search engine for Flickr. The challenge is that Flickr does not provide an official similar image search engine or corresponding API. Nevertheless, CLEAR realizes it fully on a user-side. CLEAR does not use a backend server at all nor store any images or build search indices. It is in contrast to traditional search algorithms that require preparing a backend server and building a search index. Therefore, each user can easily deploy their own CLEAR engine, and the resulting service is custom-made and privacy-preserving. The online demo is available at https://clear.joisino.net. The source code is available at https://github.com/joisino/clear.

preprint2022arXiv

Constant Time Graph Neural Networks

The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems. However, scaling up these methods to huge graphs, such as social networks and Web graphs, remains a challenge. In particular, the existing methods for accelerating GNNs either are not theoretically guaranteed in terms of the approximation error or incur at least a linear time computation cost. In this study, we reveal the query complexity of the uniform node sampling scheme for Message Passing Neural Networks, including GraphSAGE, graph attention networks (GATs), and graph convolutional networks (GCNs). Surprisingly, our analysis reveals that the complexity of the node sampling method is completely independent of the number of the nodes, edges, and neighbors of the input and depends only on the error tolerance and confidence probability while providing a theoretical guarantee for the approximation error. To the best of our knowledge, this is the first paper to provide a theoretical guarantee of approximation for GNNs within constant time. Through experiments with synthetic and real-world datasets, we investigated the speed and precision of the node sampling scheme and validated our theoretical results.

preprint2022arXiv

Fixed Support Tree-Sliced Wasserstein Barycenter

The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.

preprint2022arXiv

Poincare: Recommending Publication Venues via Treatment Effect Estimation

Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.

preprint2022arXiv

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data?

Fairness is a crucial property in recommender systems. Although some online services have adopted fairness aware systems recently, many other services have not adopted them yet. In this work, we propose methods to enable the users to build their own fair recommender systems. Our methods can generate fair recommendations even when the service does not (or cannot) provide fair recommender systems. The key challenge is that a user does not have access to the log data of other users or the latent representations of items. This restriction prohibits us from adopting existing methods designed for service providers. The main idea is that a user has access to unfair recommendations shown by the service provider. Our methods leverage the outputs of an unfair recommender system to construct a new fair recommender system. We empirically validate that our proposed method improves fairness substantially without harming much performance of the original unfair system.

preprint2022arXiv

Re-evaluating Word Mover's Distance

The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the performances of WMD and the classical baselines and find that the classical baselines are competitive with WMD if we employ an appropriate preprocessing, i.e., L1 normalization. In addition, we introduce an analogy between WMD and L1-normalized BOW and find that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces.

preprint2022arXiv

Towards Principled User-side Recommender Systems

Traditionally, recommendation algorithms have been designed for service developers. However, recently, a new paradigm called user-side recommender systems has been proposed and they enable web service users to construct their own recommender systems without access to trade-secret data. This approach opens the door to user-defined fair systems even if the official recommender system of the service is not fair. While existing methods for user-side recommender systems have addressed the challenging problem of building recommender systems without using log data, they rely on heuristic approaches, and it is still unclear whether constructing user-side recommender systems is a well-defined problem from theoretical point of view. In this paper, we provide theoretical justification of user-side recommender systems. Specifically, we see that hidden item features can be recovered from the information available to the user, making the construction of user-side recommender system well-defined. However, this theoretically grounded approach is not efficient. To realize practical yet theoretically sound recommender systems, we propose three desirable properties of user-side recommender systems and propose an effective and efficient user-side recommender system, \textsc{Consul}, based on these foundations. We prove that \textsc{Consul} satisfies all three properties, whereas existing user-side recommender systems lack at least one of them. In the experiments, we empirically validate the theory of feature recovery via numerical experiments. We also show that our proposed method achieves an excellent trade-off between effectiveness and efficiency and demonstrate via case studies that the proposed method can retrieve information that the provider's official recommender system cannot.

preprint2022arXiv

Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling

The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfactual analysis using twins, where the researchers regard twins as counterfactual units. The proposed framework regards a pair of papers that cite each other as twins. Such papers tend to be parallel works, on similar topics, and in similar communities. We investigate twin papers that adopted different decisions, observe the progress of the research impact brought by these studies, and estimate the effect of decisions by the difference in the impacts of these studies. We release our code and data, which we believe are highly beneficial owing to the scarcity of the dataset on counterfactual studies.

preprint2022arXiv

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Word embeddings are one of the most fundamental technologies used in natural language processing. Existing word embeddings are high-dimensional and consume considerable computational resources. In this study, we propose WordTour, unsupervised one-dimensional word embeddings. To achieve the challenging goal, we propose a decomposition of the desiderata of word embeddings into two parts, completeness and soundness, and focus on soundness in this paper. Owing to the single dimensionality, WordTour is extremely efficient and provides a minimal means to handle word embeddings. We experimentally confirmed the effectiveness of the proposed method via user study and document classification.

preprint2021arXiv

Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces

Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) distance is often presented as the gold standard. GW is intuitive, as it quantifies whether one measure can be isomorphically mapped to the other. However, its exact computation is intractable, and most algorithms that claim to approximate it remain expensive. Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE \emph{exactly} in log-quadratic time, where a naive implementation would be cubic. This is quasi-linear w.r.t. the description of the problem itself. Our second contribution is the proposal of robust variants of AE and AW that uses rank statistics rather than the original distances. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations. Code is available at \url{https://github.com/joisino/anchor-energy}.

preprint2021arXiv

Fast Unbalanced Optimal Transport on a Tree

This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time. We reveal which problems in unbalanced optimal transport can/cannot be solved efficiently. Specifically, we prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time under the strong exponential time hypothesis. Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric. The proposed algorithm processes a tree with one million nodes in less than one second. Our analysis forms a foundation for the theoretical study of unbalanced optimal transport algorithms and opens the door to the applications of unbalanced optimal transport to million-scale datasets.

preprint2021arXiv

Random Features Strengthen Graph Neural Networks

Graph neural networks (GNNs) are powerful machine learning models for various graph learning tasks. Recently, the limitations of the expressive power of various GNN models have been revealed. For example, GNNs cannot distinguish some non-isomorphic graphs and they cannot learn efficient graph algorithms. In this paper, we demonstrate that GNNs become powerful just by adding a random feature to each node. We prove that the random features enable GNNs to learn almost optimal polynomial-time approximation algorithms for the minimum dominating set problem and maximum matching problem in terms of approximation ratios. The main advantage of our method is that it can be combined with off-the-shelf GNN models with slight modifications. Through experiments, we show that the addition of random features enables GNNs to solve various problems that normal GNNs, including the graph convolutional networks (GCNs) and graph isomorphism networks (GINs), cannot solve.

preprint2021arXiv

Retrieving Black-box Optimal Images from External Databases

Suppose we have a black-box function (e.g., deep neural network) that takes an image as input and outputs a value that indicates preference. How can we retrieve optimal images with respect to this function from an external database on the Internet? Standard retrieval problems in the literature (e.g., item recommendations) assume that an algorithm has full access to the set of items. In other words, such algorithms are designed for service providers. In this paper, we consider the retrieval problem under different assumptions. Specifically, we consider how users with limited access to an image database can retrieve images using their own black-box functions. This formulation enables a flexible and finer-grained image search defined by each user. We assume the user can access the database through a search query with tight API limits. Therefore, a user needs to efficiently retrieve optimal images in terms of the number of queries. We propose an efficient retrieval algorithm Tiara for this problem. In the experiments, we confirm that our proposed method performs better than several baselines under various settings.