Source author record

Yanjun Li

Yanjun Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Logic in Computer Science math.DS math.OC Quantitative Methods Artificial Intelligence Computational Complexity Cryptography and Security eess.IV eess.SP Genomics Networking and Internet Architecture Populations and Evolution Software Engineering Systems and Control Tissues and Organs

Catalog footprint

What is connected

20works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization

PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.

preprint2022arXiv

Analysis and visualization of spatial transcriptomic data

Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.

preprint2022arXiv

Minimal Binary Linear Codes from Vectorial Boolean Functions

Recently, much progress has been made to construct minimal linear codes due to their preference in secret sharing schemes and secure two-party computation. In this paper, we put forward a new method to construct minimal linear codes by using vectorial Boolean functions. Firstly, we give a necessary and sufficient condition for a generic class of linear codes from vectorial Boolean functions to be minimal. Based on that, we derive some new three-weight minimal linear codes and determine their weight distributions. Secondly, we obtain a necessary and sufficient condition for another generic class of linear codes from vectorial Boolean functions to be minimal and to be violated the AB condition. As a result, we get three infinite families of minimal linear codes violating the AB condition. To the best of our knowledge, this is the first time that minimal liner codes are constructed from vectorial Boolean functions. Compared with other known ones, in general the minimal liner codes obtained in this paper have higher dimensions.

preprint2021arXiv

Constructing new APN functions through relative trace functions

In 2020, Budaghyan, Helleseth and Kaleyski [IEEE TIT 66(11): 7081-7087, 2020] considered an infinite family of quadrinomials over $\mathbb{F}_{2^{n}}$ of the form $x^3+a(x^{2^s+1})^{2^k}+bx^{3\cdot 2^m}+c(x^{2^{s+m}+2^m})^{2^k}$, where $n=2m$ with $m$ odd. They proved that such kind of quadrinomials can provide new almost perfect nonlinear (APN) functions when $\gcd(3,m)=1$, $ k=0 $, and $(s,a,b,c)=(m-2,ω, ω^2,1)$ or $((m-2)^{-1}~{\rm mod}~n,ω, ω^2,1)$ in which $ω\in\mathbb{F}_4\setminus \mathbb{F}_2$. By taking $a=ω$ and $b=c=ω^2$, we observe that such kind of quadrinomials can be rewritten as $a {\rm Tr}^{n}_{m}(bx^3)+a^q{\rm Tr}^{n}_{m}(cx^{2^s+1})$, where $q=2^m$ and $ {\rm Tr}^n_{m}(x)=x+x^{2^m} $ for $ n=2m$. Inspired by the quadrinomials and our observation, in this paper we study a class of functions with the form $f(x)=a{\rm Tr}^{n}_{m}(F(x))+a^q{\rm Tr}^{n}_{m}(G(x))$ and determine the APN-ness of this new kind of functions, where $a \in \mathbb{F}_{2^n} $ such that $ a+a^q\neq 0$, and both $F$ and $G$ are quadratic functions over $\mathbb{F}_{2^n}$. We first obtain a characterization of the conditions for $f(x)$ such that $f(x) $ is an APN function. With the help of this characterization, we obtain an infinite family of APN functions for $ n=2m $ with $m$ being an odd positive integer: $ f(x)=a{\rm Tr}^{n}_{m}(bx^3)+a^q{\rm Tr}^{n}_{m}(b^3x^9) $, where $ a\in \mathbb{F}_{2^n}$ such that $ a+a^q\neq 0 $ and $ b $ is a non-cube in $ \mathbb{F}_{2^n} $.

preprint2021arXiv

Joint Dimensionality Reduction for Separable Embedding Estimation

Low-dimensional embeddings for data from disparate sources play critical roles in multi-modal machine learning, multimedia information retrieval, and bioinformatics. In this paper, we propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities. We also propose an efficient feature selection method that complements, and can be applied prior to, our joint dimensionality reduction method. Assuming that there exist true linear embeddings for these features, our analysis of the error in the learned linear embeddings provides theoretical guarantees that the dimensionality reduction method accurately estimates the true embeddings when certain technical conditions are satisfied and the number of samples is sufficiently large. The derived sample complexity results are echoed by numerical experiments. We apply the proposed dimensionality reduction method to gene-disease association, and predict unknown associations using kernel regression on the dimension-reduced feature vectors. Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.

preprint2020arXiv

A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems

Image prior modeling is the key issue in image recovery, computational imaging, compresses sensing, and other inverse problems. Recent algorithms combining multiple effective priors such as the sparse or low-rank models, have demonstrated superior performance in various applications. However, the relationships among the popular image models are unclear, and no theory in general is available to demonstrate their connections. In this paper, we present a theoretical analysis on the image models, to bridge the gap between applications and image prior understanding, including sparsity, group-wise sparsity, joint sparsity, and low-rankness, etc. We systematically study how effective each image model is for image restoration. Furthermore, we relate the denoising performance improvement by combining multiple models, to the image model relationships. Extensive experiments are conducted to compare the denoising results which are consistent with our analysis. On top of the model-based methods, we quantitatively demonstrate the image properties that are inexplicitly exploited by deep learning method, of which can further boost the denoising performance by combining with its complementary image models.

preprint2020arXiv

Application of Deep Interpolation Network for Clustering of Physiologic Time Series

Background: During the early stages of hospital admission, clinicians must use limited information to make diagnostic and treatment decisions as patient acuity evolves. However, it is common that the time series vital sign information from patients to be both sparse and irregularly collected, which poses a significant challenge for machine / deep learning techniques to analyze and facilitate the clinicians to improve the human health outcome. To deal with this problem, We propose a novel deep interpolation network to extract latent representations from sparse and irregularly sampled time-series vital signs measured within six hours of hospital admission. Methods: We created a single-center longitudinal dataset of electronic health record data for all (n=75,762) adult patient admissions to a tertiary care center lasting six hours or longer, using 55% of the dataset for training, 23% for validation, and 22% for testing. All raw time series within six hours of hospital admission were extracted for six vital signs (systolic blood pressure, diastolic blood pressure, heart rate, temperature, blood oxygen saturation, and respiratory rate). A deep interpolation network is proposed to learn from such irregular and sparse multivariate time series data to extract the fixed low-dimensional latent patterns. We use k-means clustering algorithm to clusters the patient admissions resulting into 7 clusters. Findings: Training, validation, and testing cohorts had similar age (55-57 years), sex (55% female), and admission vital signs. Seven distinct clusters were identified. M Interpretation: In a heterogeneous cohort of hospitalized patients, a deep interpolation network extracted representations from vital sign data measured within six hours of hospital admission. This approach may have important implications for clinical decision-support under time constraints and uncertainty.

preprint2020arXiv

PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders

Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. In this work, we first propose a novel learning objective, termed the principle-of-relevant-information variational autoencoder (PRI-VAE), to learn disentangled representations. We then present an information-theoretic perspective to analyze existing VAE models by inspecting the evolution of some critical information-theoretic quantities across training epochs. Our observations unveil some fundamental properties associated with VAEs. Empirical results also demonstrate the effectiveness of PRI-VAE on four benchmark data sets.

preprint2016arXiv

A Dynamic Epistemic Framework for Conformant Planning

In this paper, we introduce a lightweight dynamic epistemic logical framework for automated planning under initial uncertainty. We reduce plan verification and conformant planning to model checking problems of our logic. We show that the model checking problem of the iteration-free fragment is PSPACE-complete. By using two non-standard (but equivalent) semantics, we give novel model checking algorithms to the full language and the iteration-free language.

preprint2016arXiv

Achieving while maintaining: A logic of knowing how with intermediate constraints

In this paper, we propose a ternary knowing how operator to express that the agent knows how to achieve $ϕ$ given $ψ$ while maintaining $χ$ in-between. It generalizes the logic of goal-directed knowing how proposed by Yanjing Wang 2015 'A logic of knowing how'. We give a sound and complete axiomatization of this logic.

preprint2016arXiv

Cooperative output regulation of multi-agent network systems with dynamic edges

This paper investigates a new class of linear multi-agent network systems, in which nodes are coupled by dynamic edges in the sense that each edge has a dynamic system attached as well. The outputs of the edge dynamic systems form the external inputs of the node dynamic systems, which are termed "neighboring inputs" representing the coupling actions between nodes. The outputs of the node dynamic systems are the inputs of the edge dynamic systems. Several cooperative output regulation problems are posed, including output synchronization, output cooperation and master-slave output cooperation. Output cooperation is specified as making the neighboring input, a weighted sum of edge outputs, track a predefined trajectory by cooperation of node outputs. Distributed cooperative output regulation controllers depending on local state and neighboring inputs are presented, which are designed by combining feedback passivity theories and the internal model principle. A simulation example on the cooperative current control of an electrical network illustrates the potential applications of the analytical results.

preprint2016arXiv

Joint Dimensionality Reduction for Two Feature Vectors

Many machine learning problems, especially multi-modal learning problems, have two sets of distinct features (e.g., image and text features in news story classification, or neuroimaging data and neurocognitive data in cognitive science research). This paper addresses the joint dimensionality reduction of two feature vectors in supervised learning problems. In particular, we assume a discriminative model where low-dimensional linear embeddings of the two feature vectors are sufficient statistics for predicting a dependent variable. We show that a simple algorithm involving singular value decomposition can accurately estimate the embeddings provided that certain sample complexities are satisfied, without specifying the nonlinear link function (regressor or classifier). The main results establish sample complexities under multiple settings. Sample complexities for different link functions only differ by constant factors.

preprint2016arXiv

Stability and steady state analysis of distributed cooperative droop controlled DC microgrids

Distributed cooperative droop control consisting of the primary decentralized droop control and the {secondary} distributed correction control is studied in this paper, which aims to achieve an exact current sharing between generators, worked in the voltage control mode, of DC microgrids. For the DC microgrids with the distributed cooperative droop control, the dynamic stability has not been well investigated although its steady performance has been widely reported. This paper focuses on the stability problem and shows it is equivalent to the semistability problem of a class of second-order matrix systems. Some further sufficient conditions as well followed. The steady state is analyzed deeply for some special cases. A DC microgrid of three nodes is simulated on the Matlab/Simulink platform to illustrate the efficacy of analytic results.

preprint2015arXiv

A Unified Framework for Identifiability Analysis in Bilinear Inverse Problems with Applications to Subspace and Sparsity Models

Bilinear inverse problems (BIPs), the resolution of two vectors given their image under a bilinear mapping, arise in many applications. Without further constraints, BIPs are usually ill-posed. In practice, properties of natural signals are exploited to solve BIPs. For example, subspace constraints or sparsity constraints are imposed to reduce the search space. These approaches have shown some success in practice. However, there are few results on uniqueness in BIPs. For most BIPs, the fundamental question of under what condition the problem admits a unique solution, is yet to be answered. For example, blind gain and phase calibration (BGPC) is a structured bilinear inverse problem, which arises in many applications, including inverse rendering in computational relighting (albedo estimation with unknown lighting), blind phase and gain calibration in sensor array processing, and multichannel blind deconvolution (MBD). It is interesting to study the uniqueness of such problems. In this paper, we define identifiability of a BIP up to a group of transformations. We derive necessary and sufficient conditions for such identifiability, i.e., the conditions under which the solutions can be uniquely determined up to the transformation group. Applying these results to BGPC, we derive sufficient conditions for unique recovery under several scenarios, including subspace, joint sparsity, and sparsity models. For BGPC with joint sparsity or sparsity constraints, we develop a procedure to compute the relevant transformation groups. We also give necessary conditions in the form of tight lower bounds on sample complexities, and demonstrate the tightness of these bounds by numerical experiments. The results for BGPC not only demonstrate the application of the proposed general framework for identifiability analysis, but are also of interest in their own right.

preprint2015arXiv

Blind Recovery of Sparse Signals from Subsampled Convolution

Subsampled blind deconvolution is the recovery of two unknown signals from samples of their convolution. To overcome the ill-posedness of this problem, solutions based on priors tailored to specific application have been developed in practical applications. In particular, sparsity models have provided promising priors. However, in spite of empirical success of these methods in many applications, existing analyses are rather limited in two main ways: by disparity between the theoretical assumptions on the signal and/or measurement model versus practical setups; or by failure to provide a performance guarantee for parameter values within the optimal regime defined by the information theoretic limits. In particular, it has been shown that a naive sparsity model is not a strong enough prior for identifiability in the blind deconvolution problem. Instead, in addition to sparsity, we adopt a conic constraint, which enforces spectral flatness of the signals. Under this prior, we provide an iterative algorithm that achieves guaranteed performance in blind deconvolution at near optimal sample complexity. Numerical results show the empirical performance of the iterative algorithm agrees with the performance guarantee.

preprint2015arXiv

Identifiability in Blind Deconvolution with Subspace or Sparsity Constraints

Blind deconvolution (BD), the resolution of a signal and a filter given their convolution, arises in many applications. Without further constraints, BD is ill-posed. In practice, subspace or sparsity constraints have been imposed to reduce the search space, and have shown some empirical success. However, existing theoretical analysis on uniqueness in BD is rather limited. As an effort to address the still mysterious question, we derive sufficient conditions under which two vectors can be uniquely identified from their circular convolution, subject to subspace or sparsity constraints. These sufficient conditions provide the first algebraic sample complexities for BD. We first derive a sufficient condition that applies to almost all bases or frames. For blind deconvolution of vectors in $\mathbb{C}^n$, with two subspace constraints of dimensions $m_1$ and $m_2$, the required sample complexity is $n\geq m_1m_2$. Then we impose a sub-band structure on one basis, and derive a sufficient condition that involves a relaxed sample complexity $n\geq m_1+m_2-1$, which we show to be optimal. We present the extensions of these results to BD with sparsity constraints or mixed constraints, with the sparsity level replacing the subspace dimension. The cost for the unknown support in this case is an extra factor of 2 in the sample complexity.

preprint2015arXiv

Optimal Sample Complexity for Blind Gain and Phase Calibration

Blind gain and phase calibration (BGPC) is a structured bilinear inverse problem, which arises in many applications, including inverse rendering in computational relighting (albedo estimation with unknown lighting), blind phase and gain calibration in sensor array processing, and multichannel blind deconvolution. The fundamental question of the uniqueness of the solutions to such problems has been addressed only recently. In a previous paper, we proposed studying the identifiability in bilinear inverse problems up to transformation groups. In particular, we studied several special cases of blind gain and phase calibration, including the cases of subspace and joint sparsity models on the signals, and gave sufficient and necessary conditions for identifiability up to certain transformation groups. However, there were gaps between the sample complexities in the sufficient conditions and the necessary conditions. In this paper, under a mild assumption that the signals and models are generic, we bridge the gaps by deriving tight sufficient conditions with optimal sample complexities.

preprint2015arXiv

RF-Based Charger Placement for Duty Cycle Guarantee in Battery-Free Sensor Networks

Battery-free sensor networks have emerged as a promising solution to conquer the lifetime limitation of battery-powered systems. In this paper, we study a sensor network built from battery-free sensor nodes which harvest energy from radio frequency (RF) signals transmitted by RF-based chargers, e.g., radio frequency identification (RFID) readers. Due to the insufficiency of harvested energy, the sensor nodes have to work in duty cycles to harvest enough energy before turning active and performing tasks. One fundamental issue in this kind of network design is how to deploy the chargers to ensure that the battery-free nodes can maintain a designated duty cycle for continuous operation. Based on a new wireless recharge model, we formulate the charger placement problem for node's duty cycle guarantee as a constrained optimization problem. We develop both greedy and efficient heuristics for solving the problem and validate our solutions through extensive simulations. The simulation results show that the proposed particle swarm optimization (PSO)-based divide-and-conquer approach can effectively reduce the number of chargers compared with the greedy approach.

preprint2015arXiv

The Universality of Cancer

Cancer has been characterized as a constellation of hundreds of diseases differing in underlying mutations and depending on cellular environments. Carcinogenesis as a stochastic physical process has been studied for over sixty years, but there is no accepted standard model. We show that the hazard rates of all cancers are characterized by a simple dynamic stochastic process on a half-line, with a universal linear restoring force balancing a universal simple Brownian motion starting from a universal initial distribution. Only a critical radius defining the transition from normal to tumorigenic genomes distinguishes between different cancer types when time is measured in cell--cycle units. Reparametrizing to chronological time units introduces two additional parameters: the onset of cellular senescence with age and the time interval over which this cessation in replication takes place. This universality implies that there may exist a finite separation between normal cells and tumorigenic cells in all tissue types that may be a viable target for both early detection and preventive therapy.

preprint2013arXiv

Synchronized output regulation of nonlinear multi-agent systems

This paper considers the synchronized output regulation (SOR) problem of nonlinear multi-agent systems with switching graph. The SOR means that all agents regulate their outputs to synchronize on the output of a predefined common exosystem. Each agent constructs its local exosystem with the same dynamics as that of the common exosystem and exchanges the state information of the local exosystem. It is shown that the SOR is solvable under the assumptions same as that for nonlinear output regulation of a single agent, if the switching graph satisfies the bounded interconnectivity times condition. Both state feedback and output feedback are addressed. A numerical simulation is made to show the efficacy of the analytic results.

Yanjun Li

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization

Analysis and visualization of spatial transcriptomic data

Minimal Binary Linear Codes from Vectorial Boolean Functions

Constructing new APN functions through relative trace functions

Joint Dimensionality Reduction for Separable Embedding Estimation

A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems

Application of Deep Interpolation Network for Clustering of Physiologic Time Series

PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders

A Dynamic Epistemic Framework for Conformant Planning

Achieving while maintaining: A logic of knowing how with intermediate constraints

Cooperative output regulation of multi-agent network systems with dynamic edges

Joint Dimensionality Reduction for Two Feature Vectors

Stability and steady state analysis of distributed cooperative droop controlled DC microgrids

A Unified Framework for Identifiability Analysis in Bilinear Inverse Problems with Applications to Subspace and Sparsity Models

Blind Recovery of Sparse Signals from Subsampled Convolution

Identifiability in Blind Deconvolution with Subspace or Sparsity Constraints

Optimal Sample Complexity for Blind Gain and Phase Calibration

RF-Based Charger Placement for Duty Cycle Guarantee in Battery-Free Sensor Networks

The Universality of Cancer

Synchronized output regulation of nonlinear multi-agent systems