Researcher profile

Haokun Li

Haokun Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2024arXiv

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access

preprint2023arXiv

Isolating Bounded and Unbounded Real Roots of a Mixed Trigonometric-Polynomial

Mixed trigonometric-polynomials (MTPs) are functions of the form $f(x,\sin{x}, \cos{x})$ with $f\in\mathbb{Q}[x_1,x_2,x_3]$. In this paper, an algorithm ``isolating" all the real roots of an MTP is provided and implemented. It automatically divides the real roots into two parts: one consists of finitely many ``bounded" roots in an interval $[μ_-,μ_+]$ while the other consists of probably countably many ``periodic" roots in $\mathbb{R}\backslash[μ_-,μ_+]$. For bounded roots, the algorithm returns isolating intervals and corresponding multiplicities while for periodic roots, it returns finitely many mutually disjoint small intervals $I_i\subset[-π,π]$, integers $c_i>0$ and multisets of root multiplicity $\{m_{j,i}\}_{j=1}^{c_i}$ such that any periodic root $t>μ_+$ is in the set $(\sqcup_i\cup_{k\in\mathbb{N}}(I_i+2kπ))$ and any interval $I_i+2kπ\subset(μ_+,\infty)$ contains exactly $c_i$ periodic roots with multiplicities $m_{1,i},...,m_{c_i,i}$, respectively. The effectiveness and efficiency of the algorithm are shown by experiments. %In particular, our results indicate that the ``distributions" of the roots of an MTP in the ``periods" $(-π,π]+2kπ$ sufficiently far from $0$ share a same pattern. Besides, the method used to isolate the roots in $[μ_-,μ_+]$ is applicable to any other bounded interval as well. The algorithm takes advantages of the weak Fourier sequence technique and deals with the intervals period-by-period without scaling the coordinate so to keep the length of the sequence short. The new approaches can easily be modified to decide whether there is any root, or whether there are infinitely many roots in unbounded intervals of the form $(-\infty,a)$ or $(a,\infty)$ with $a\in\mathbb{Q}$.

preprint2022arXiv

Boost Test-Time Performance with Closed-Loop Inference

Conventional deep models predict a test sample with a single forward propagation, which, however, may not be sufficient for predicting hard-classified samples. On the contrary, we human beings may need to carefully check the sample many times before making a final decision. During the recheck process, one may refine/adjust the prediction by referring to related samples. Motivated by this, we propose to predict those hard-classified test samples in a looped manner to boost the model performance. However, this idea may pose a critical challenge: how to construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort. To address this, we propose a general Closed-Loop Inference (CLI) method. Specifically, we first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops. For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model, and then use the calibrated model to obtain the final prediction. Promising results on ImageNet (in-distribution test samples) and ImageNet-C (out-of-distribution test samples) demonstrate the effectiveness of CLI in improving the performance of any pre-trained model.

preprint2022arXiv

Square-free Strong Triangular Decomposition of Zero-dimensional Polynomial Systems

Triangular decomposition with different properties has been used for various types of problem solving, e.g. geometry theorem proving, real solution isolation of zero-dimensional polynomial systems, etc. In this paper, the concepts of strong chain and square-free strong triangular decomposition (SFSTD) of zero-dimensional polynomial systems are defined. Because of its good properties, SFSTD may be a key way to many problems related to zero-dimensional polynomial systems, such as real solution isolation and computing radicals of zero-dimensional ideals. Inspired by the work of Wang and of Dong and Mou, we propose an algorithm for computing SFSTD based on Gröbner bases computation. The novelty of the algorithm is that we make use of saturated ideals and separant to ensure that the zero sets of any two strong chains have no intersection and every strong chain is square-free, respectively. On one hand, we prove that the arithmetic complexity of the new algorithm can be single exponential in the square of the number of variables, which seems to be among the rare complexity analysis results for triangular-decomposition methods. On the other hand, we show experimentally that, on a large number of examples in the literature, the new algorithm is far more efficient than a popular triangular-decomposition method based on pseudo-division. Furthermore, it is also shown that, on those examples, the methods based on SFSTD for real solution isolation and for computing radicals of zero-dimensional ideals are very efficient.

preprint2021arXiv

Choosing the Variable Ordering for Cylindrical Algebraic Decomposition via Exploiting Chordal Structure

Cylindrical algebraic decomposition (CAD) plays an important role in the field of real algebraic geometry and many other areas. As is well-known, the choice of variable ordering while computing CAD has a great effect on the time and memory use of the computation as well as the number of sample points computed. In this paper, we indicate that typical CAD algorithms, if executed with respect to a special kind of variable orderings (called "the perfect elimination orderings"), naturally preserve chordality, which is an important property on sparsity of variables. Experimentation suggests that if the associated graph of the polynomial system in question is chordal (\emph{resp.}, is nearly chordal), then a perfect elimination ordering of the associated graph (\emph{resp.}, of a minimal chordal completion of the associated graph) can be a good variable ordering for the CAD computation. That is, by using the perfect elimination orderings, the CAD computation may produce a much smaller full set of projection polynomials than by using other naive variable orderings. More importantly, for the complexity analysis of the CAD computation via a perfect elimination ordering, a so-called $(m,d)$-property of the full set of projection polynomials obtained via such an ordering is given, through which the "size" of this set is characterized. This property indicates that when the corresponding perfect elimination tree has a lower height, the full set of projection polynomials also tends to have a smaller "size". This is well consistent with the experimental results, hence the perfect elimination orderings with lower elimination tree height are further recommended to be used in the CAD projection.

preprint2020arXiv

Generative Low-bitwidth Data Free Quantization

Neural network quantization is an effective way to compress deep models and improve their execution latency and energy efficiency, so that they can be deployed on mobile or embedded devices. Existing quantization methods require original data for calibration or fine-tuning to get better performance. However, in many real-world scenarios, the data may not be available due to confidential or private issues, thereby making existing quantization methods not applicable. Moreover, due to the absence of original data, the recently developed generative adversarial networks (GANs) cannot be applied to generate data. Although the full-precision model may contain rich data information, such information alone is hard to exploit for recovering the original data or generating new meaningful data. In this paper, we investigate a simple-yet-effective method called Generative Low-bitwidth Data Free Quantization (GDFQ) to remove the data dependence burden. Specifically, we propose a knowledge matching generator to produce meaningful fake data by exploiting classification boundary knowledge and distribution information in the pre-trained model. With the help of generated data, we can quantize a model by learning knowledge from the pre-trained model. Extensive experiments on three data sets demonstrate the effectiveness of our method. More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method. Code is available at https://github.com/xushoukai/GDFQ.

preprint2020arXiv

Solving Satisfiability of Polynomial Formulas By Sample-Cell Projection

A new algorithm for deciding the satisfiability of polynomial formulas over the reals is proposed. The key point of the algorithm is a new projection operator, called sample-cell projection operator, custom-made for Conflict-Driven Clause Learning (CDCL)-style search. Although the new operator is also a CAD (Cylindrical Algebraic Decomposition)-like projection operator which computes the cell (not necessarily cylindrical) containing a given sample such that each polynomial from the problem is sign-invariant on the cell, it is of singly exponential time complexity. The sample-cell projection operator can efficiently guide CDCL-style search away from conflicting states. Experiments show the effectiveness of the new algorithm.