Researcher profile

Juho Lauri

Juho Lauri contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2020arXiv

Finding path motifs in large temporal graphs using algebraic fingerprints

We study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with five colors and up to hundred million edges for a multiset query with ten colors, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with more than six million edges and a multiset query with ten colors, we can extract an optimum solution in less than eight minutes on a Haswell desktop with four cores.

preprint2020arXiv

Learning fine-grained search space pruning and heuristics for combinatorial optimization

Combinatorial optimization problems arise in a wide range of applications from diverse domains. Many of these problems are NP-hard and designing efficient heuristics for them requires considerable time and experimentation. On the other hand, the number of optimization problems in the industry continues to grow. In recent years, machine learning techniques have been explored to address this gap. We propose a framework for leveraging machine learning techniques to scale-up exact combinatorial optimization algorithms. In contrast to the existing approaches based on deep-learning, reinforcement learning and restricted Boltzmann machines that attempt to directly learn the output of the optimization problem from its input (with limited success), our framework learns the relatively simpler task of pruning the elements in order to reduce the size of the problem instances. In addition, our framework uses only interpretable learning models based on intuitive features and thus the learning process provides deeper insights into the optimization problem and the instance class, that can be used for designing better heuristics. For the classical maximum clique enumeration problem, we show that our framework can prune a large fraction of the input graph (around 99 % of nodes in case of sparse graphs) and still detect almost all of the maximum cliques. This results in several fold speedups of state-of-the-art algorithms. Furthermore, the model used in our framework highlights that the chi-squared value of neighborhood degree has a statistically significant correlation with the presence of a node in a maximum clique, particularly in dense graphs which constitute a significant challenge for modern solvers. We leverage this insight to design a novel heuristic for this problem outperforming the state-of-the-art. Our heuristic is also of independent interest for maximum clique detection and enumeration.

preprint2020arXiv

Perfect Italian domination on planar and regular graphs

A perfect Italian dominating function of a graph $G=(V,E)$ is a function $f : V \to \{0,1,2\}$ such that for every vertex $f(v) = 0$, it holds that $\sum_{u \in N(v)} f(u) = 2$, i.e., the weight of the labels assigned by $f$ to the neighbors of $v$ is exactly two. The weight of a perfect Italian function is the sum of the weights of the vertices. The perfect Italian domination number of $G$, denoted by $γ^p_I(G)$, is the minimum weight of any perfect Italian dominating function of $G$. While introducing the parameter, Haynes and Henning (Discrete Appl. Math. (2019), 164--177) also proposed the problem of determining the best possible constants $c_\mathcal{G}$ such that $γ^p_I(G) \leq c_\mathcal{G} \times n$ for all graphs of order $n$ when $G$ is in a particular class $\mathcal{G}$ of graphs. They proved that $c_\mathcal{G} = 1$ when $\mathcal{G}$ is the class of bipartite graphs, and raised the question for planar graphs and regular graphs. We settle their question precisely for planar graphs by proving that $c_\mathcal{G} = 1$ and for cubic graphs by proving that $c_\mathcal{G} = 2/3$. For split graphs, we also show that $c_\mathcal{G} = 1$. In addition, we characterize the graphs $G$ with $γ^p_I(G)$ equal to 2 and 3 and determine the exact value of the parameter for several simple structured graphs. We conclude by proving that it is NP-complete to decide whether a given bipartite planar graph admits a perfect Italian dominating function of weight $k$.

preprint2020arXiv

Towards Quantifying the Distance between Opinions

Increasingly, critical decisions in public policy, governance, and business strategy rely on a deeper understanding of the needs and opinions of constituent members (e.g. citizens, shareholders). While it has become easier to collect a large number of opinions on a topic, there is a necessity for automated tools to help navigate the space of opinions. In such contexts understanding and quantifying the similarity between opinions is key. We find that measures based solely on text similarity or on overall sentiment often fail to effectively capture the distance between opinions. Thus, we propose a new distance measure for capturing the similarity between opinions that leverages the nuanced observation -- similar opinions express similar sentiment polarity on specific relevant entities-of-interest. Specifically, in an unsupervised setting, our distance measure achieves significantly better Adjusted Rand Index scores (up to 56x) and Silhouette coefficients (up to 21x) compared to existing approaches. Similarly, in a supervised setting, our opinion distance measure achieves considerably better accuracy (up to 20% increase) compared to extant approaches that rely on text similarity, stance similarity, and sentiment similarity

preprint2020arXiv

Upper Bounding Rainbow Connection Number by Forest Number

A path in an edge-colored graph is rainbow if no two edges of it are colored the same, and the graph is rainbow-connected if there is a rainbow path between each pair of its vertices. The minimum number of colors needed to rainbow-connect a graph $G$ is the rainbow connection number of $G$, denoted by $\text{rc}(G)$. A simple way to rainbow-connect a graph $G$ is to color the edges of a spanning tree with distinct colors and then re-use any of these colors to color the remaining edges of $G$. This proves that $\text{rc}(G) \le |V(G)|-1$. We ask whether there is a stronger connection between tree-like structures and rainbow coloring than that is implied by the above trivial argument. For instance, is it possible to find an upper bound of $t(G) -1$ for $\text{rc}(G)$, where $t(G)$ is the number of vertices in the largest induced tree of $G$? The answer turns out to be negative, as there are counter-examples that show that even $c\cdot t(G)$ is not an upper bound for $\text{rc}(G))$ for any given constant $c$. In this work we show that if we consider the forest number $f(G)$, the number of vertices in a maximum induced forest of $G$, instead of $t(G)$, then surprisingly we do get an upper bound. More specifically, we prove that $\text{rc}(G) \leq f(G) + 2$. Our result indicates a stronger connection between rainbow connection and tree-like structures than that was suggested by the simple spanning tree based upper bound.