Source author record

Chuan Xiao

Chuan Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Information Retrieval Data Structures and Algorithms Machine Learning Performance

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Empirical Study of Personalized Federated Learning

Federated learning is a distributed machine learning approach in which a single server and multiple clients collaboratively build machine learning models without sharing datasets on clients. A challenging issue of federated learning is data heterogeneity (i.e., data distributions may differ across clients). To cope with this issue, numerous federated learning methods aim at personalized federated learning and build optimized models for clients. Whereas existing studies empirically evaluated their own methods, the experimental settings (e.g., comparison methods, datasets, and client setting) in these studies differ from each other, and it is unclear which personalized federate learning method achieves the best performance and how much progress can be made by using these methods instead of standard (i.e., non-personalized) federated learning. In this paper, we benchmark the performance of existing personalized federated learning through comprehensive experiments to evaluate the characteristics of each method. Our experimental study shows that (1) there are no champion methods, (2) large data heterogeneity often leads to high accurate predictions, and (3) standard federated learning methods (e.g. FedAvg) with fine-tuning often outperform personalized federated learning methods. We open our benchmark tool FedBench for researchers to conduct experimental studies with various experimental settings.

preprint2022arXiv

Similarity Search on Computational Notebooks

Computational notebook software such as Jupyter Notebook is popular for data science tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task, and so far, there are no tools to search for computational notebooks effectively and efficiently. In this paper, we propose a similarity search on computational notebooks and develop a new framework for the similarity search. Given contents (i.e., source codes, tabular data, libraries, and outputs formats) in computational notebooks as a query, the similarity search problem aims to find top-k computational notebooks with the most similar contents. We define two similarity measures; set-based and graph-based similarities. Set-based similarity handles each content independently, while graph-based similarity captures the relationships between contents. Our framework can effectively prune the candidates of computational notebooks that should not be in the top-k results. Furthermore, we develop optimization techniques such as caching and indexing to accelerate the search. Experiments using Kaggle notebooks show that our method, in particular graph-based similarity, can achieve high accuracy and high efficiency.

preprint2020arXiv

Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints

In this paper, we address a similarity search problem for spatial trajectories in road networks. In particular, we focus on the subtrajectory similarity search problem, which involves finding in a database the subtrajectories similar to a query trajectory. A key feature of our approach is that we do not focus on a specific similarity function; instead, we consider weighted edit distance (WED), a class of similarity functions which allows user-defined cost functions and hence includes several important similarity functions such as EDR and ERP. We model trajectories as strings, and propose a generic solution which is able to deal with any similarity function belonging to the class of WED. By employing the filter-and-verify strategy, we introduce subsequence filtering to efficiently prunes trajectories and find candidates. In order to choose a proper subsequence to optimize the candidate number, we model the choice as a discrete optimization problem (NP-hard) and compute it using a 2-approximation algorithm. To verify candidates, we design bidirectional tries, with which the verification starts from promising positions and leverage the shared segments of trajectories and the sparsity of road networks for speed-up. Experiments are conducted on large datasets to demonstrate the effectiveness of WED and the efficiency of our method for various similarity functions under WED.

preprint2020arXiv

Pigeonring: A Principle for Faster Thresholded Similarity Search

The pigeonhole principle states that if $n$ items are contained in $m$ boxes, then at least one box has no more than $n / m$ items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions. To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the new principle. This suggests that all the pigeonhole principle-based solutions are possible to be accelerated by the new principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the new principle. Besides, we discuss how to quickly find candidates specified by the new principle. The implementation requires only minor modifications on top of existing pigeonhole principle-based algorithms. Experimental results on real datasets demonstrate the applicability of the new principle as well as the superior performance of the algorithms based on the new principle.

Chuan Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

An Empirical Study of Personalized Federated Learning

Similarity Search on Computational Notebooks

Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints

Pigeonring: A Principle for Faster Thresholded Similarity Search