Researcher profile

Jianzhong Li

Jianzhong Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2024arXiv

The PCP-like Theorem for Sub-linear Time Inapproximability

In this paper we propose the PCP-like theorem for sub-linear time inapproximability. Abboud et al. have devised the distributed PCP framework for proving sub-quadratic time inapproximability. Here we try to go further in this direction. Staring from SETH, we first find a problem denoted as Ext-$k$-SAT, which can not be computed in linear time, then devise an efficient MA-like protocol for this problem. To use this protocol to prove the sub-linear time inapproximability of other problems, we devise a new kind of reduction denoted as Ext-reduction, and it is different from existing reduction techniques. We also define two new hardness class, the problems in which can be computed in linear-time, but can not be efficiently approximated in sub-linear time. Some problems are shown to be in the newly defined hardness class.

preprint2022arXiv

A New Model for Massively Parallel Computation Considering both Communication and IO Cost

In the research area of parallel computation, the communication cost has been extensively studied, while the IO cost has been neglected. For big data computation, the assumption that the data fits in main memory no longer holds, and external memory must be used. Therefore, it is necessary to bring the IO cost into the parallel computation model. In this paper, we propose the first parallel computation model which takes IO cost as well as non-uniform communication cost into consideration. Based on the new model, we raise several new problems which aim to minimize the IO and communication cost on the new model. We prove the hardness of these new problems, then design and analyze the approximate algorithms for solving them.

preprint2022arXiv

Dynamic Approximate Maximum Independent Set on Massive Graphs

Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem of maintaining a MaxIS over dynamic graphs has attracted increasing attention over the past few years. Due to the intractability of maintaining an exact MaxIS, this paper aims to develop efficient algorithms that can maintain an approximate MaxIS with an accuracy guarantee theoretically. In particular, we propose a framework that maintains a $(\fracΔ{2} + 1)$-approximate MaxIS over dynamic graphs and prove that it achieves a constant approximation ratio in many real-world networks. To the best of our knowledge, this is the first non-trivial approximability result for the dynamic MaxIS problem. Following the framework, we implement an efficient linear-time dynamic algorithm and a more effective dynamic algorithm with near-linear expected time complexity. Our thorough experiments over real and synthetic graphs demonstrate the effectiveness and efficiency of the proposed algorithms, especially when the graph is highly dynamic.

preprint2022arXiv

PCP Theorems, SETH and More: Towards Proving Sub-linear Time Inapproximability

In this paper we propose the PCP-like theorem for sub-linear time inapproximability. Abboud et al. have devised the distributed PCP framework for sub-quadratic time inapproximability. We show that the distributed PCP theorem can be generalized for proving arbitrary polynomial time inapproximability, but fails in the linear case. We prove the sub-linear PCP theorem by adapting from an MA-protocol for the Set Containment problem, and show how to use the theorem to prove both existing and new inapproximability results, exhibiting the power of the sub-linear PCP theorem. Considering the emerging research works on sub-linear time algorithms, the sub-linear PCP theorem is important in guiding the research in sub-linear time approximation algorithms.

preprint2022arXiv

Rank-Regret Minimization

Multi-criteria decision-making often requires finding a small representative set from the database. A recently proposed method is the regret minimization set (RMS) query. RMS returns a size $r$ subset $S$ of dataset $D$ that minimizes the regret-ratio (the difference between the score of top-1 in $S$ and the score of top-1 in $D$, for any possible utility function). RMS is not shift invariant, causing inconsistency in results. Further, existing work showed that the regret-ratio is often a made-up number and users may mistake its absolute value. Instead, users do understand the notion of rank. Thus it considered the problem of finding the minimal set $S$ with a rank-regret (the rank of top-1 tuple of $S$ in the sorted list of $D$) at most $k$, called the rank-regret representative (RRR) problem. Corresponding to RMS, we focus on the min-error version of RRR, called the rank-regret minimization (RRM) problem, which finds a size $r$ set to minimize the maximum rank-regret for all utility functions. Further, we generalize RRM and propose the restricted RRM (i.e., RRRM) problem to optimize the rank-regret for functions restricted in a given space. Previous studies on both RMS and RRR did not consider the restricted function space. The solution for RRRM usually has a lower regret level and can better serve the specific preferences of some users. Note that RRM and RRRM are shift invariant. In 2D space, we design a dynamic programming algorithm 2DRRM to return the optimal solution for RRM. In HD space, we propose an algorithm HDRRM that introduces a double approximation guarantee on rank-regret. Both 2DRRM and HDRRM are applicable for RRRM. Extensive experiments on the synthetic and real datasets verify the efficiency and effectiveness of our algorithms. In particular, HDRRM always has the best output quality in experiments.

preprint2022arXiv

Turing Machines with Two-level Memory: A Deep Look into the Input/Output Complexity

The input/output complexity, which is the complexity of data exchange between the main memory and the external memory, has been elaborately studied by a lot of former researchers. However, the existing works failed to consider the input/output complexity in a computation model point of view. In this paper we remedy this by proposing three variants of Turing machine that include external memory and the mechanism of exchanging data between main memory and external memory. Based on these new models, the input/output complexity is deeply studied. We discussed the relationship between input/output complexity and the other complexity measures such as time complexity and parameterized complexity, which is not considered by former researchers. We also define the external access trace complexity, which reflects the physical behavior of magnetic disks and gives a theoretical evidence of IO-efficient algorithms.

preprint2020arXiv

A Sub-linear Time Algorithm for Approximating k-Nearest-Neighbor with Full Quality Guarantee

In this paper we propose an algorithm for the approximate k-Nearest-Neighbors problem. According to the existing researches, there are two kinds of approximation criterion. One is the distance criteria, and the other is the recall criteria. All former algorithms suffer the problem that there are no theoretical guarantees for the two approximation criterion. The algorithm proposed in this paper unifies the two kinds of approximation criterion, and has full theoretical guarantees. Furthermore, the query time of the algorithm is sub-linear. As far as we know, it is the first algorithm that achieves both sub-linear query time and full theoretical approximation guarantee.

preprint2020arXiv

Auto-Model: Utilizing Research Papers and HPO Techniques to Deal with the CASH problem

In many fields, a mass of algorithms with completely different hyperparameters have been developed to address the same type of problems. Choosing the algorithm and hyperparameter setting correctly can promote the overall performance greatly, but users often fail to do so due to the absence of knowledge. How to help users to effectively and quickly select the suitable algorithm and hyperparameter settings for the given task instance is an important research topic nowadays, which is known as the CASH problem. In this paper, we design the Auto-Model approach, which makes full use of known information in the related research paper and introduces hyperparameter optimization techniques, to solve the CASH problem effectively. Auto-Model tremendously reduces the cost of algorithm implementations and hyperparameter configuration space, and thus capable of dealing with the CASH problem efficiently and easily. To demonstrate the benefit of Auto-Model, we compare it with classical Auto-Weka approach. The experimental results show that our proposed approach can provide superior results and achieves better performance in a short time.

preprint2020arXiv

Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing

Data inconsistency evaluating and repairing are major concerns in data quality management. As the basic computing task, optimal subset repair is not only applied for cost estimation during the progress of database repairing, but also directly used to derive the evaluation of database inconsistency. Computing an optimal subset repair is to find a minimum tuple set from an inconsistent database whose remove results in a consistent subset left. Tight bound on the complexity and efficient algorithms are still unknown. In this paper, we improve the existing complexity and algorithmic results, together with a fast estimation on the size of optimal subset repair. We first strengthen the dichotomy for optimal subset repair computation problem, we show that it is not only APXcomplete, but also NPhard to approximate an optimal subset repair with a factor better than $17/16$ for most cases. We second show a $(2-0.5^{\tinyσ-1})$-approximation whenever given $σ$ functional dependencies, and a $(2-η_k+\frac{η_k}{k})$-approximation when an $η_k$-portion of tuples have the $k$-quasi-Tur$\acute{\text{a}}$n property for some $k>1$. We finally show a sublinear estimator on the size of optimal \textit{S}-repair for subset queries, it outputs an estimation of a ratio $2n+εn$ with a high probability, thus deriving an estimation of FD-inconsistency degree of a ratio $2+ε$. To support a variety of subset queries for FD-inconsistency evaluation, we unify them as the $\subseteq$-oracle which can answer membership-query, and return $p$ tuples uniformly sampled whenever given a number $p$. Experiments are conducted on range queries as an implementation of $\subseteq$-oracle, and results show the efficiency of our FD-inconsistency degree estimator.

preprint2020arXiv

PHOTOPiC: Calculate photo-ionization functions and model coefficients for gas discharge simulations

A program to compute photo-ionization functions and fitting parameters for an efficient photo-ionization model is presented. The code integrates the product of spectrum emission intensity, the photo-ionization yield and the absorption coefficient to calculate the photo-ionization function of each gas and the total photo-ionization function of the mixture. The coefficients of Helmholtz photo-ionization model is obtained by fitting the total photo-ionization function. A database consisting $\rm N_2$, $\rm O_2$, $\rm CO_2$ and $\rm H_2O$ molecules are included and can be modified by the users. The program provides more accurate photo-ionization functions and source terms for plasma fluid models.