Researcher profile

Yuichi Yoshida

Yuichi Yoshida contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Low Degree Testing over the Reals

We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.

preprint2022arXiv

On the three graph invariants related to matching of finite simple graphs

Let $G$ be a finite simple graph on the vertex set $V(G)$ and let $\text{ind-match}(G)$, $\text{min-match}(G)$ and $\text{match}(G)$ denote the induced matching number, the minimum matching number and the matching number of $G$, respectively. It is known that the inequalities $\text{ind-match}(G) \leq \text{min-match}(G) \leq \text{match}(G) \leq 2\text{min-match}(G)$ and $\text{match}(G) \leq \left\lfloor |V(G)|/2 \right\rfloor$ hold in general. In the present paper, we determine the possible tuples $(p, q, r, n)$ with $\text{ind-match}(G) = p$, $\text{min-match}(G) = q$, $\text{match}(G) = r$ and $|V(G)| = n$ arising from connected simple graphs. As an application of this result, we also determine the possible tuples $(p', q, r, n)$ with ${\rm{reg}}(G) = p'$, $\text{min-match}(G) = q$, $\text{match}(G) = r$ and $|V(G)| = n$ arising from connected simple graphs, where $I(G)$ is the edge ideal of $G$ and ${\rm{reg}}(G) = {\rm{reg}}(K[V(G)]/I(G))$ is the Castelnuovo--Mumford regularity of the quotient ring $K[V(G)]/I(G)$.

preprint2022arXiv

Sparsification of Decomposable Submodular Functions

Submodular functions are at the core of many machine learning and data mining tasks. The underlying submodular functions for many of these tasks are decomposable, i.e., they are sum of several simple submodular functions. In many data intensive applications, however, the number of underlying submodular functions in the original function is so large that we need prohibitively large amount of time to process it and/or it does not even fit in the main memory. To overcome this issue, we introduce the notion of sparsification for decomposable submodular functions whose objective is to obtain an accurate approximation of the original function that is a (weighted) sum of only a few submodular functions. Our main result is a polynomial-time randomized sparsification algorithm such that the expected number of functions used in the output is independent of the number of underlying submodular functions in the original function. We also study the effectiveness of our algorithm under various constraints such as matroid and cardinality constraints. We complement our theoretical analysis with an empirical study of the performance of our algorithm.

preprint2021arXiv

RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding

Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs) to derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail). Moreover, we show that marginal loss minimisation, a popular objective used in much prior work in KGE, follows naturally from the log-likelihood ratio maximisation under the probabilities estimated from the KGEs according to our theoretical relationship. We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph. Using the derived objective, accurate KGEs are learnt from FB15K237 and WN18RR benchmark datasets, providing empirical evidence in support of the theory.

preprint2020arXiv

Approximability of Monotone Submodular Function Maximization under Cardinality and Matroid Constraints in the Streaming Model

Maximizing a monotone submodular function under various constraints is a classical and intensively studied problem. However, in the single-pass streaming model, where the elements arrive one by one and an algorithm can store only a small fraction of input elements, there is much gap in our knowledge, even though several approximation algorithms have been proposed in the literature. In this work, we present the first lower bound on the approximation ratios for cardinality and matroid constraints that beat $1-\frac{1}{e}$ in the single-pass streaming model. Let $n$ be the number of elements in the stream. Then, we prove that any (randomized) streaming algorithm for a cardinality constraint with approximation ratio $\frac{2}{2+\sqrt{2}}+\varepsilon$ requires $Ω\left(\frac{n}{K^2}\right)$ space for any $\varepsilon>0$, where $K$ is the size limit of the output set. We also prove that any (randomized) streaming algorithm for a (partition) matroid constraint with approximation ratio $\frac{K}{2K-1}+\varepsilon$ requires $Ω\left(\frac{n}{K}\right)$ space for any $\varepsilon>0$, where $K$ is the rank of the given matroid. In addition, we give streaming algorithms when we only have a weak oracle with which we can only evaluate function values on feasible sets. Specifically, we show weak-oracle streaming algorithms for cardinality and matroid constraints with approximation ratios $\frac{K}{2K-1}$ and $\frac{1}{2}$, respectively, whose space complexity is exponential in $K$ but is independent of $n$. The former one exactly matches the known inapproximability result for a cardinality constraint in the weak oracle model. The latter one almost matches our lower bound of $\frac{K}{2K-1}$ for a matroid constraint, which almost settles the approximation ratio for a matroid constraint that can be obtained by a streaming algorithm whose space complexity is independent of $n$.

preprint2020arXiv

Average Sensitivity of Graph Algorithms

In modern applications of graphs algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. Hence, it is desirable to use algorithms that, even when only a (large) subgraph is available, output solutions that are close to the solutions output when the whole graph is available. We formalize this idea by introducing the notion of average sensitivity of graph algorithms, which is the average earth mover's distance between the output distributions of an algorithm on a graph and its subgraph obtained by removing an edge, where the average is over the edges removed and the distance between two outputs is the Hamming distance. In this work, we initiate a systematic study of average sensitivity. After deriving basic properties of average sensitivity such as composition, we provide efficient approximation algorithms with low average sensitivities for concrete graph problems, including the minimum spanning forest problem, the global minimum cut problem, the minimum $s$-$t$ cut problem, and the maximum matching problem. In addition, we prove that the average sensitivity of our global minimum cut algorithm is almost optimal, by showing a nearly matching lower bound. We also show that every algorithm for the 2-coloring problem has average sensitivity linear in the number of vertices. One of the main ideas involved in designing our algorithms with low average sensitivity is the following fact; if the presence of a vertex or an edge in the solution output by an algorithm can be decided locally, then the algorithm has a low average sensitivity, allowing us to reuse the analyses of known sublinear-time algorithms and local computation algorithms (LCAs). Using this connection, we show that every LCA for 2-coloring has linear query complexity, thereby answering an open question.

preprint2020arXiv

Average Sensitivity of Spectral Clustering

Spectral clustering is one of the most popular clustering methods for finding clusters in a graph, which has found many applications in data mining. However, the input graph in those applications may have many missing edges due to error in measurement, withholding for a privacy reason, or arbitrariness in data conversion. To make reliable and efficient decisions based on spectral clustering, we assess the stability of spectral clustering against edge perturbations in the input graph using the notion of average sensitivity, which is the expected size of the symmetric difference of the output clusters before and after we randomly remove edges. We first prove that the average sensitivity of spectral clustering is proportional to $λ_2/λ_3^2$, where $λ_i$ is the $i$-th smallest eigenvalue of the (normalized) Laplacian. We also prove an analogous bound for $k$-way spectral clustering, which partitions the graph into $k$ clusters. Then, we empirically confirm our theoretical bounds by conducting experiments on synthetic and real networks. Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.

preprint2020arXiv

Fast and Private Submodular and $k$-Submodular Functions Maximization with Matroid Constraints

The problem of maximizing nonnegative monotone submodular functions under a certain constraint has been intensively studied in the last decade, and a wide range of efficient approximation algorithms have been developed for this problem. Many machine learning problems, including data summarization and influence maximization, can be naturally modeled as the problem of maximizing monotone submodular functions. However, when such applications involve sensitive data about individuals, their privacy concerns should be addressed. In this paper, we study the problem of maximizing monotone submodular functions subject to matroid constraints in the framework of differential privacy. We provide $(1-\frac{1}{\mathrm{e}})$-approximation algorithm which improves upon the previous results in terms of approximation guarantee. This is done with an almost cubic number of function evaluations in our algorithm. Moreover, we study $k$-submodularity, a natural generalization of submodularity. We give the first $\frac{1}{2}$-approximation algorithm that preserves differential privacy for maximizing monotone $k$-submodular functions subject to matroid constraints. The approximation ratio is asymptotically tight and is obtained with an almost linear number of function evaluations.

preprint2020arXiv

Sensitivity Analysis of the Maximum Matching Problem

We consider the sensitivity of algorithms for the maximum matching problem against edge and vertex modifications. Algorithms with low sensitivity are desirable because they are robust to edge failure or attack. In this work, we show a randomized $(1-ε)$-approximation algorithm with worst-case sensitivity $O_ε(1)$, which substantially improves upon the $(1-ε)$-approximation algorithm of Varma and Yoshida (arXiv 2020) that obtains average sensitivity $n^{O(1/(1+ε^2))}$ sensitivity algorithm, and show a deterministic $1/2$-approximation algorithm with sensitivity $\exp(O(\log^*n))$ for bounded-degree graphs. We show that any deterministic constant-factor approximation algorithm must have sensitivity $Ω(\log^* n)$. Our results imply that randomized algorithms are strictly more powerful than deterministic ones in that the former can achieve sensitivity independent of $n$ whereas the latter cannot. We also show analogous results for vertex sensitivity, where we remove a vertex instead of an edge. As an application of our results, we give an algorithm for the online maximum matching with $O_ε(n)$ total replacements in the vertex-arrival model. By comparison, Bernstein et al. (J. ACM 2019) gave an online algorithm that always outputs the maximum matching, but only for bipartite graphs and with $O(n\log n)$ total replacements. Finally, we introduce the notion of normalized weighted sensitivity, a natural generalization of sensitivity that accounts for the weights of deleted edges. We show that if all edges in a graph have polynomially bounded weight, then given a trade-off parameter $α>2$, there exists an algorithm that outputs a $\frac{1}{4α}$-approximation to the maximum weighted matching in $O(m\log_α n)$ time, with normalized weighted sensitivity $O(1)$. See paper for full abstract.

preprint2020arXiv

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic gradient descent for this framework. While existing studies of CVaR optimization require direct access to the underlying distribution, our algorithms make a weaker assumption that only i.i.d.\ samples are given. For convex and Lipschitz loss functions, we show that our algorithm has $O(1/\sqrt{n})$-convergence to the optimal CVaR, where $n$ is the number of samples. For nonconvex and smooth loss functions, we show a generalization bound on CVaR. By conducting numerical experiments on various machine learning tasks, we demonstrate that our algorithms effectively minimize CVaR compared with other baseline algorithms.