Source author record

Hiroki Arimura

Hiroki Arimura appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Databases Machine Learning

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Cartesian Tree Subsequence Matching

Park et al. [TCS 2020] observed that the similarity between two (numerical) strings can be captured by the Cartesian trees: The Cartesian tree of a string is a binary tree recursively constructed by picking up the smallest value of the string as the root of the tree. Two strings of equal length are said to Cartesian-tree match if their Cartesian trees are isomorphic. Park et al. [TCS 2020] introduced the following Cartesian tree substring matching (CTMStr) problem: Given a text string $T$ of length $n$ and a pattern string of length $m$, find every consecutive substring $S = T[i..j]$ of a text string $T$ such that $S$ and $P$ Cartesian-tree match. They showed how to solve this problem in $\tilde{O}(n+m)$ time. In this paper, we introduce the Cartesian tree subsequence matching (CTMSeq) problem, that asks to find every minimal substring $S = T[i..j]$ of $T$ such that $S$ contains a subsequence $S'$ which Cartesian-tree matches $P$. We prove that the CTMSeq problem can be solved efficiently, in $O(m n p(n))$ time, where $p(n)$ denotes the update/query time for dynamic predecessor queries. By using a suitable dynamic predecessor data structure, we obtain $O(mn \log \log n)$-time and $O(n \log m)$-space solution for CTMSeq. This contrasts CTMSeq with closely related order-preserving subsequence matching (OPMSeq) which was shown to be NP-hard by Bose et al. [IPL 1998].

preprint2022arXiv

Computing the Collection of Good Models for Rule Lists

Since the seminal paper by Breiman in 2001, who pointed out a potential harm of prediction multiplicities from the view of explainable AI, global analysis of a collection of all good models, also known as a `Rashomon set,' has been attracted much attention for the last years. Since finding such a set of good models is a hard computational problem, there have been only a few algorithms for the problem so far, most of which are either approximate or incomplete. To overcome this difficulty, we study efficient enumeration of all good models for a subclass of interpretable models, called rule lists. Based on a state-of-the-art optimal rule list learner, CORELS, proposed by Angelino et al. in 2017, we present an efficient enumeration algorithm CorelsEnum for exactly computing a set of all good models using polynomial space in input size, given a dataset and a error tolerance from an optimal model. By experiments with the COMPAS dataset on recidivism prediction, our algorithm CorelsEnum successfully enumerated all of several tens of thousands of good rule lists of length at most $\ell = 3$ in around 1,000 seconds, while a state-of-the-art top-$K$ rule list learner based on Lawler's method combined with CORELS, proposed by Hara and Ishihata in 2018, found only 40 models until the timeout of 6,000 seconds. For global analysis, we conducted experiments for characterizing the Rashomon set, and observed large diversity of models in predictive multiplicity and fairness of models.

preprint2020arXiv

Efficient Constrained Pattern Mining Using Dynamic Item Ordering for Explainable Classification

Learning of interpretable classification models has been attracting much attention for the last few years. Discovery of succinct and contrasting patterns that can highlight the differences between the two classes is very important. Such patterns are useful for human experts, and can be used to construct powerful classifiers. In this paper, we consider mining of minimal emerging patterns from high-dimensional data sets under a variety of constraints in a supervised setting. We focus on an extension in which patterns can contain negative items that designate the absence of an item. In such a case, a database becomes highly dense, and it makes mining more challenging since popular pattern mining techniques such as fp-tree and occurrence deliver do not efficiently work. To cope with this difficulty, we present an efficient algorithm for mining minimal emerging patterns by combining two techniques: dynamic variable-ordering during pattern search for enhancing pruning effect, and the use of a pointer-based dynamic data structure, called dancing links, for efficiently maintaining occurrence lists. Experiments on benchmark data sets showed that our algorithm achieves significant speed-ups over emerging pattern mining approach based on LCM, a very fast depth-first frequent itemset miner using static variable-ordering.

preprint2019arXiv

An Efficient Algorithm for Enumerating Chordal Bipartite Induced Subgraphs in Sparse Graphs

In this paper, we propose a characterization of chordal bipartite graphs and an efficient enumeration algorithm for chordal bipartite induced subgraphs. A chordal bipartite graph is a bipartite graph without induced cycles with length six or more. It is known that the incident graph of a hypergraph is chordal bipartite graph if and only if the hypergraph is $β$-acyclic. As the main result of our paper, we show that a graph $G$ is chordal bipartite if and only if there is a special vertex elimination ordering for $G$, called CBEO. Moreover, we propose an algorithm ECB which enumerates all chordal bipartite induced subgraphs in $O(ktΔ^2)$ time per solution on average, where $k$ is the degeneracy, $t$ is the maximum size of $K_{t,t}$ as an induced subgraph, and $Δ$ is the degree. ECB achieves constant amortized time enumeration for bounded degree graphs.

preprint2018arXiv

Efficient Enumeration of Dominating Sets for Sparse Graphs

A dominating set $D$ of a graph $G$ is a set of vertices such that any vertex in $G$ is in $D$ or its neighbor is in $D$. Enumeration of minimal dominating sets in a graph is one of central problems in enumeration study since enumeration of minimal dominating sets corresponds to enumeration of minimal hypergraph transversal. However, enumeration of dominating sets including non-minimal ones has not been received much attention. In this paper, we address enumeration problems for dominating sets from sparse graphs which are degenerate graphs and graphs with large girth, and we propose two algorithms for solving the problems. The first algorithm enumerates all the dominating sets for a $k$-degenerate graph in $O(k)$ time per solution using $O(n + m)$ space, where $n$ and $m$ are respectively the number of vertices and edges in an input graph. That is, the algorithm is optimal for graphs with constant degeneracy such as trees, planar graphs, $H$-minor free graphs with some fixed $H$. The second algorithm enumerates all the dominating sets in constant time per solution for input graphs with girth at least nine.

preprint2014arXiv

Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph

In this paper, we address the problem of enumerating all induced subtrees in an input k-degenerate graph, where an induced subtree is an acyclic and connected induced subgraph. A graph G = (V, E) is a k-degenerate graph if for any its induced subgraph has a vertex whose degree is less than or equal to k, and many real-world graphs have small degeneracies, or very close to small degeneracies. Although, the studies are on subgraphs enumeration, such as trees, paths, and matchings, but the problem addresses the subgraph enumeration, such as enumeration of subgraphs that are trees. Their induced subgraph versions have not been studied well. One of few example is for chordless paths and cycles. Our motivation is to reduce the time complexity close to O(1) for each solution. This type of optimal algorithms are proposed many subgraph classes such as trees, and spanning trees. Induced subtrees are fundamental object thus it should be studied deeply and there possibly exist some efficient algorithms. Our algorithm utilizes nice properties of k-degeneracy to state an effective amortized analysis. As a result, the time complexity is reduced to O(k) time per induced subtree. The problem is solved in constant time for each in planar graphs, as a corollary.