Researcher profile

Yanjun Li

Yanjun Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization

PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.

preprint2022arXiv

Analysis and visualization of spatial transcriptomic data

Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.

preprint2022arXiv

Minimal Binary Linear Codes from Vectorial Boolean Functions

Recently, much progress has been made to construct minimal linear codes due to their preference in secret sharing schemes and secure two-party computation. In this paper, we put forward a new method to construct minimal linear codes by using vectorial Boolean functions. Firstly, we give a necessary and sufficient condition for a generic class of linear codes from vectorial Boolean functions to be minimal. Based on that, we derive some new three-weight minimal linear codes and determine their weight distributions. Secondly, we obtain a necessary and sufficient condition for another generic class of linear codes from vectorial Boolean functions to be minimal and to be violated the AB condition. As a result, we get three infinite families of minimal linear codes violating the AB condition. To the best of our knowledge, this is the first time that minimal liner codes are constructed from vectorial Boolean functions. Compared with other known ones, in general the minimal liner codes obtained in this paper have higher dimensions.

preprint2021arXiv

Constructing new APN functions through relative trace functions

In 2020, Budaghyan, Helleseth and Kaleyski [IEEE TIT 66(11): 7081-7087, 2020] considered an infinite family of quadrinomials over $\mathbb{F}_{2^{n}}$ of the form $x^3+a(x^{2^s+1})^{2^k}+bx^{3\cdot 2^m}+c(x^{2^{s+m}+2^m})^{2^k}$, where $n=2m$ with $m$ odd. They proved that such kind of quadrinomials can provide new almost perfect nonlinear (APN) functions when $\gcd(3,m)=1$, $ k=0 $, and $(s,a,b,c)=(m-2,ω, ω^2,1)$ or $((m-2)^{-1}~{\rm mod}~n,ω, ω^2,1)$ in which $ω\in\mathbb{F}_4\setminus \mathbb{F}_2$. By taking $a=ω$ and $b=c=ω^2$, we observe that such kind of quadrinomials can be rewritten as $a {\rm Tr}^{n}_{m}(bx^3)+a^q{\rm Tr}^{n}_{m}(cx^{2^s+1})$, where $q=2^m$ and $ {\rm Tr}^n_{m}(x)=x+x^{2^m} $ for $ n=2m$. Inspired by the quadrinomials and our observation, in this paper we study a class of functions with the form $f(x)=a{\rm Tr}^{n}_{m}(F(x))+a^q{\rm Tr}^{n}_{m}(G(x))$ and determine the APN-ness of this new kind of functions, where $a \in \mathbb{F}_{2^n} $ such that $ a+a^q\neq 0$, and both $F$ and $G$ are quadratic functions over $\mathbb{F}_{2^n}$. We first obtain a characterization of the conditions for $f(x)$ such that $f(x) $ is an APN function. With the help of this characterization, we obtain an infinite family of APN functions for $ n=2m $ with $m$ being an odd positive integer: $ f(x)=a{\rm Tr}^{n}_{m}(bx^3)+a^q{\rm Tr}^{n}_{m}(b^3x^9) $, where $ a\in \mathbb{F}_{2^n}$ such that $ a+a^q\neq 0 $ and $ b $ is a non-cube in $ \mathbb{F}_{2^n} $.

preprint2021arXiv

Joint Dimensionality Reduction for Separable Embedding Estimation

Low-dimensional embeddings for data from disparate sources play critical roles in multi-modal machine learning, multimedia information retrieval, and bioinformatics. In this paper, we propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities. We also propose an efficient feature selection method that complements, and can be applied prior to, our joint dimensionality reduction method. Assuming that there exist true linear embeddings for these features, our analysis of the error in the learned linear embeddings provides theoretical guarantees that the dimensionality reduction method accurately estimates the true embeddings when certain technical conditions are satisfied and the number of samples is sufficiently large. The derived sample complexity results are echoed by numerical experiments. We apply the proposed dimensionality reduction method to gene-disease association, and predict unknown associations using kernel regression on the dimension-reduced feature vectors. Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.

preprint2020arXiv

A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems

Image prior modeling is the key issue in image recovery, computational imaging, compresses sensing, and other inverse problems. Recent algorithms combining multiple effective priors such as the sparse or low-rank models, have demonstrated superior performance in various applications. However, the relationships among the popular image models are unclear, and no theory in general is available to demonstrate their connections. In this paper, we present a theoretical analysis on the image models, to bridge the gap between applications and image prior understanding, including sparsity, group-wise sparsity, joint sparsity, and low-rankness, etc. We systematically study how effective each image model is for image restoration. Furthermore, we relate the denoising performance improvement by combining multiple models, to the image model relationships. Extensive experiments are conducted to compare the denoising results which are consistent with our analysis. On top of the model-based methods, we quantitatively demonstrate the image properties that are inexplicitly exploited by deep learning method, of which can further boost the denoising performance by combining with its complementary image models.

preprint2020arXiv

Application of Deep Interpolation Network for Clustering of Physiologic Time Series

Background: During the early stages of hospital admission, clinicians must use limited information to make diagnostic and treatment decisions as patient acuity evolves. However, it is common that the time series vital sign information from patients to be both sparse and irregularly collected, which poses a significant challenge for machine / deep learning techniques to analyze and facilitate the clinicians to improve the human health outcome. To deal with this problem, We propose a novel deep interpolation network to extract latent representations from sparse and irregularly sampled time-series vital signs measured within six hours of hospital admission. Methods: We created a single-center longitudinal dataset of electronic health record data for all (n=75,762) adult patient admissions to a tertiary care center lasting six hours or longer, using 55% of the dataset for training, 23% for validation, and 22% for testing. All raw time series within six hours of hospital admission were extracted for six vital signs (systolic blood pressure, diastolic blood pressure, heart rate, temperature, blood oxygen saturation, and respiratory rate). A deep interpolation network is proposed to learn from such irregular and sparse multivariate time series data to extract the fixed low-dimensional latent patterns. We use k-means clustering algorithm to clusters the patient admissions resulting into 7 clusters. Findings: Training, validation, and testing cohorts had similar age (55-57 years), sex (55% female), and admission vital signs. Seven distinct clusters were identified. M Interpretation: In a heterogeneous cohort of hospitalized patients, a deep interpolation network extracted representations from vital sign data measured within six hours of hospital admission. This approach may have important implications for clinical decision-support under time constraints and uncertainty.

preprint2020arXiv

PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders

Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. In this work, we first propose a novel learning objective, termed the principle-of-relevant-information variational autoencoder (PRI-VAE), to learn disentangled representations. We then present an information-theoretic perspective to analyze existing VAE models by inspecting the evolution of some critical information-theoretic quantities across training epochs. Our observations unveil some fundamental properties associated with VAEs. Empirical results also demonstrate the effectiveness of PRI-VAE on four benchmark data sets.