Source author record

Angsheng Li

Angsheng Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Data Structures and Algorithms Artificial Intelligence Information Theory math.IT Computation and Language Machine Learning math.LO math.PR

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis

Large Language Models (LLMs) and agent-based systems often struggle with compositional generalization due to a data bottleneck in which complex skill combinations follow a long-tailed, power-law distribution, limiting both instruction-following performance and generalization in agent-centric tasks. To address this challenge, we propose STEPS, a Skill Taxonomy guided Entropy-based Post-training data Synthesis framework for generating compositionally challenging data. STEPS explicitly targets compositional generalization by uncovering latent relationships among skills and organizing them into an interpretable, hierarchical skill taxonomy using structural information theory. Building on this taxonomy, we formulate data synthesis as a constrained information maximization problem, selecting skill combinations that maximize marginal structural information within the hierarchy while preserving semantic coherence. Experiments on challenging instruction-following benchmarks show that STEPS outperforms existing data synthesis baselines, while also yielding improved compositional generalization in downstream agent-based evaluations.

preprint2020arXiv

Structural Information Learning Machinery: Learning from Observing, Associating, Optimizing, Decoding, and Abstracting

In the present paper, we propose the model of {\it structural information learning machines} (SiLeM for short), leading to a mathematical definition of learning by merging the theories of computation and information. Our model shows that the essence of learning is {\it to gain information}, that to gain information is {\it to eliminate uncertainty} embedded in a data space, and that to eliminate uncertainty of a data space can be reduced to an optimization problem, that is, an {\it information optimization problem}, which can be realized by a general {\it encoding tree method}. The principle and criterion of the structural information learning machines are maximization of {\it decoding information} from the data points observed together with the relationships among the data points, and semantical {\it interpretation} of syntactical {\it essential structure}, respectively. A SiLeM machine learns the laws or rules of nature. It observes the data points of real world, builds the {\it connections} among the observed data and constructs a {\it data space}, for which the principle is to choose the way of connections of data points so that the {\it decoding information} of the data space is maximized, finds the {\it encoding tree} of the data space that minimizes the dynamical uncertainty of the data space, in which the encoding tree is hence referred to as a {\it decoder}, due to the fact that it has already eliminated the maximum amount of uncertainty embedded in the data space, interprets the {\it semantics} of the decoder, an encoding tree, to form a {\it knowledge tree}, extracts the {\it remarkable common features} for both semantical and syntactical features of the modules decoded by a decoder to construct {\it trees of abstractions}, providing the foundations for {\it intuitive reasoning} in the learning when new data are observed.

preprint2015arXiv

Testing Small Set Expansion in General Graphs

We consider the problem of testing small set expansion for general graphs. A graph $G$ is a $(k,ϕ)$-expander if every subset of volume at most $k$ has conductance at least $ϕ$. Small set expansion has recently received significant attention due to its close connection to the unique games conjecture, the local graph partitioning algorithms and locally testable codes. We give testers with two-sided error and one-sided error in the adjacency list model that allows degree and neighbor queries to the oracle of the input graph. The testers take as input an $n$-vertex graph $G$, a volume bound $k$, an expansion bound $ϕ$ and a distance parameter $\varepsilon>0$. For the two-sided error tester, with probability at least $2/3$, it accepts the graph if it is a $(k,ϕ)$-expander and rejects the graph if it is $\varepsilon$-far from any $(k^*,ϕ^*)$-expander, where $k^*=Θ(k\varepsilon)$ and $ϕ^*=Θ(\frac{ϕ^4}{\min\{\log(4m/k),\log n\}\cdot(\ln k)})$. The query complexity and running time of the tester are $\widetilde{O}(\sqrt{m}ϕ^{-4}\varepsilon^{-2})$, where $m$ is the number of edges of the graph. For the one-sided error tester, it accepts every $(k,ϕ)$-expander, and with probability at least $2/3$, rejects every graph that is $\varepsilon$-far from $(k^*,ϕ^*)$-expander, where $k^*=O(k^{1-ξ})$ and $ϕ^*=O(ξϕ^2)$ for any $0<ξ<1$. The query complexity and running time of this tester are $\widetilde{O}(\sqrt{\frac{n}{\varepsilon^3}}+\frac{k}{\varepsilon ϕ^4})$. We also give a two-sided error tester with smaller gap between $ϕ^*$ and $ϕ$ in the rotation map model that allows (neighbor, index) queries and degree queries.

preprint2013arXiv

Characters and patterns of communities in networks

In this paper, we propose some new notions to characterize and analyze the communities. The new notions are general characters of the communities or local structures of networks. At first, we introduce the notions of internal dominating set and external dominating set of a community. We show that most communities in real networks have a small internal dominating set and a small external dominating set, and that the internal dominating set of a community keeps much of the information of the community. Secondly, based on the notions of the internal dominating set and the external dominating set, we define an internal slope (ISlope, for short) and an external slope (ESlope, for short) to measure the internal heterogeneity and external heterogeneity of a community respectively. We show that the internal slope (ISlope) of a community largely determines the structure of the community, that most communities in real networks are heterogeneous, meaning that most of the communities have a core/periphery structure, and that both ISlopes and ESlopes (reflecting the structure of communities) of all the communities of a network approximately follow a normal distribution. Therefore typical values of both ISolpes and ESoples of all the communities of a given network are in a narrow interval, and there is only a small number of communities having ISlopes or ESlopes out of the range of typical values of the ISlopes and ESlopes of the network. Finally, we show that all the communities of the real networks we studied, have a three degree separation phenomenon, that is, the average distance of communities is approximately 3, implying a general property of true communities for many real networks, and that good community finding algorithms find communities that amplify clustering coefficients of the networks, for many real networks.

preprint2013arXiv

Community Structures Are Definable in Networks, and Universal in Real World

Community detecting is one of the main approaches to understanding networks \cite{For2010}. However it has been a longstanding challenge to give a definition for community structures of networks. Here we found that community structures are definable in networks, and are universal in real world. We proposed the notions of entropy- and conductance-community structure ratios. It was shown that the definitions of the modularity proposed in \cite{NG2004}, and our entropy- and conductance-community structures are equivalent in defining community structures of networks, that randomness in the ER model \cite{ER1960} and preferential attachment in the PA \cite{Bar1999} model are not mechanisms of community structures of networks, and that the existence of community structures is a universal phenomenon in real networks. Our results demonstrate that community structure is a universal phenomenon in the real world that is definable, solving the challenge of definition of community structures in networks. This progress provides a foundation for a structural theory of networks.

preprint2013arXiv

Community Structures Are Definable in Networks: A Structural Theory of Networks

We found that neither randomness in the ER model nor the preferential attachment in the PA model is the mechanism of community structures of networks, that community structures are universal in real networks, that community structures are definable in networks, that communities are interpretable in networks, and that homophyly is the mechanism of community structures and a structural theory of networks. We proposed the notions of entropy- and conductance-community structures. It was shown that the two definitions of the entropy- and conductance-community structures and the notion of modularity proposed by physicists are all equivalent in defining community structures of networks, that neither randomness in the ER model nor preferential attachment in the PA model is the mechanism of community structures of networks, and that the existence of community structures is a universal phenomenon in real networks. This poses a fundamental question: What are the mechanisms of community structures of real networks? To answer this question, we proposed a homophyly model of networks. It was shown that networks of our model satisfy a series of new topological, probabilistic and combinatorial principles, including a fundamental principle, a community structure principle, a degree priority principle, a widths principle, an inclusion and infection principle, a king node principle and a predicting principle etc. The new principles provide a firm foundation for a structural theory of networks. Our homophyly model demonstrates that homophyly is the underlying mechanism of community structures of networks, that nodes of the same community share common features, that power law and small world property are never obstacles of the existence of community structures in networks, that community structures are {\it definable} in networks, and that (natural) communities are {\it interpretable}.

preprint2013arXiv

Detecting and Characterizing Small Dense Bipartite-like Subgraphs by the Bipartiteness Ratio Measure

We study the problem of finding and characterizing subgraphs with small \textit{bipartiteness ratio}. We give a bicriteria approximation algorithm \verb|SwpDB| such that if there exists a subset $S$ of volume at most $k$ and bipartiteness ratio $θ$, then for any $0<ε<1/2$, it finds a set $S'$ of volume at most $2k^{1+ε}$ and bipartiteness ratio at most $4\sqrt{θ/ε}$. By combining a truncation operation, we give a local algorithm \verb|LocDB|, which has asymptotically the same approximation guarantee as the algorithm \verb|SwpDB| on both the volume and bipartiteness ratio of the output set, and runs in time $O(ε^2θ^{-2}k^{1+ε}\ln^3k)$, independent of the size of the graph. Finally, we give a spectral characterization of the small dense bipartite-like subgraphs by using the $k$th \textit{largest} eigenvalue of the Laplacian of the graph.

preprint2013arXiv

Dimensions, Structures and Security of Networks

One of the main issues in modern network science is the phenomenon of cascading failures of a small number of attacks. Here we define the dimension of a network to be the maximal number of functions or features of nodes of the network. It was shown that there exist linear networks which are provably secure, where a network is linear, if it has dimension one, that the high dimensions of networks are the mechanisms of overlapping communities, that overlapping communities are obstacles for network security, and that there exists an algorithm to reduce high dimensional networks to low dimensional ones which simultaneously preserves all the network properties and significantly amplifies security of networks. Our results explore that dimension is a fundamental measure of networks, that there exist linear networks which are provably secure, that high dimensional networks are insecure, and that security of networks can be amplified by reducing dimensions.

preprint2013arXiv

Homophyly and Randomness Resist Cascading Failure in Networks

The universal properties of power law and small world phenomenon of networks seem unavoidably obstacles for security of networking systems. Existing models never give secure networks. We found that the essence of security is the security against cascading failures of attacks and that nature solves the security by mechanisms. We proposed a model of networks by the natural mechanisms of homophyly, randomness and preferential attachment. It was shown that homophyly creates a community structure, that homophyly and randomness introduce ordering in the networks, and that homophyly creates inclusiveness and introduces rules of infections. These principles allow us to provably guarantee the security of the networks against any attacks. Our results show that security can be achieved provably by structures, that there is a tradeoff between the roles of structures and of thresholds in security engineering, and that power law and small world property are never obstacles for security of networks.

preprint2013arXiv

Homophyly Networks -- A Structural Theory of Networks

A grand challenge in network science is apparently the missing of a structural theory of networks. The authors have showed that the existence of community structures is a universal phenomenon in real networks, and that neither randomness nor preferential attachment is a mechanism of community structures of network \footnote{A. Li, J. Li, and Y. Pan, Community structures are definable in networks, and universal in the real world, To appear.}. This poses a fundamental question: What are the mechanisms of community structures of real networks? Here we found that homophyly is the mechanism of community structures and a structural theory of networks. We proposed a homophyly model. It was shown that networks of our model satisfy a series of new topological, probabilistic and combinatorial principles, including a fundamental principle, a community structure principle, a degree priority principle, a widths principle, an inclusion and infection principle, a king node principle, and a predicting principle etc, leading to a structural theory of networks. Our model demonstrates that homophyly is the underlying mechanism of community structures of networks, that nodes of the same community share common features, that power law and small world property are never obstacles of the existence of community structures in networks, and that community structures are definable in networks.

preprint2013arXiv

Kolmogorov complexity and computably enumerable sets

We study the computably enumerable sets in terms of the: (a) Kolmogorov complexity of their initial segments; (b) Kolmogorov complexity of finite programs when they are used as oracles. We present an extended discussion of the existing research on this topic, along with recent developments and open problems. Besides this survey, our main original result is the following characterization of the computably enumerable sets with trivial initial segment prefix-free complexity. A computably enumerable set $A$ is $K$-trivial if and only if the family of sets with complexity bounded by the complexity of $A$ is uniformly computable from the halting problem.

preprint2013arXiv

Provable Security of Networks

We propose a definition of {\it security} and a definition of {\it robustness} of networks against the cascading failure models of deliberate attacks and random errors respectively, and investigate the principles of the security and robustness of networks. We propose a {\it security model} such that networks constructed by the model are provably secure against any attacks of small sizes under the cascading failure models, and simultaneously follow a power law, and have the small world property with a navigating algorithm of time complex $O(\log n)$. It is shown that for any network $G$ constructed from the security model, $G$ satisfies some remarkable topological properties, including: (i) the {\it small community phenomenon}, that is, $G$ is rich in communities of the form $X$ of size poly logarithmic in $\log n$ with conductance bounded by $O(\frac{1}{|X|^β})$ for some constant $β$, (ii) small diameter property, with diameter $O(\log n)$ allowing a navigation by a $O(\log n)$ time algorithm to find a path for arbitrarily given two nodes, and (iii) power law distribution, and satisfies some probabilistic and combinatorial principles, including the {\it degree priority theorem}, and {\it infection-inclusion theorem}. By using these principles, we show that a network $G$ constructed from the security model is secure for any attacks of small scales under both the uniform threshold and random threshold cascading failure models. Our security theorems show that networks constructed from the security model are provably secure against any attacks of small sizes, for which natural selections of {\it homophyly, randomness} and {\it preferential attachment} are the underlying mechanisms.

preprint2012arXiv

Algorithmic Aspects of Homophyly of Networks

We investigate the algorithmic problems of the {\it homophyly phenomenon} in networks. Given an undirected graph $G = (V, E)$ and a vertex coloring $c \colon V \rightarrow {1, 2, ..., k}$ of $G$, we say that a vertex $v\in V$ is {\it happy} if $v$ shares the same color with all its neighbors, and {\it unhappy}, otherwise, and that an edge $e\in E$ is {\it happy}, if its two endpoints have the same color, and {\it unhappy}, otherwise. Supposing $c$ is a {\it partial vertex coloring} of $G$, we define the Maximum Happy Vertices problem (MHV, for short) as to color all the remaining vertices such that the number of happy vertices is maximized, and the Maximum Happy Edges problem (MHE, for short) as to color all the remaining vertices such that the number of happy edges is maximized. Let $k$ be the number of colors allowed in the problems. We show that both MHV and MHE can be solved in polynomial time if $k = 2$, and that both MHV and MHE are NP-hard if $k \geq 3$. We devise a $\max {1/k, Ω(Δ^{-3})}$-approximation algorithm for the MHV problem, where $Δ$ is the maximum degree of vertices in the input graph, and a 1/2-approximation algorithm for the MHE problem. This is the first theoretical progress of these two natural and fundamental new problems.

preprint2011arXiv

The Small-Community Phenomenon in Networks

We investigate several geometric models of network which simultaneously have some nice global properties, that the small diameter property, the small-community phenomenon, which is defined to capture the common experience that (almost) every one in our society belongs to some meaningful small communities by the authors (2011), and that under certain conditions on the parameters, the power law degree distribution, which significantly strengths the results given by van den Esker (2008), and Jordan (2010). The results above, together with our previous progress in Li and Peng (2011), build a mathematical foundation for the study of communities and the small-community phenomenon in various networks. In the proof of the power law degree distribution, we develop the method of alternating concentration analysis to build concentration inequality by alternatively and iteratively applying both the sub- and super-martingale inequalities, which seems powerful, and which may have more potential applications.

Angsheng Li

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis

Structural Information Learning Machinery: Learning from Observing, Associating, Optimizing, Decoding, and Abstracting

Testing Small Set Expansion in General Graphs

Characters and patterns of communities in networks

Community Structures Are Definable in Networks, and Universal in Real World

Community Structures Are Definable in Networks: A Structural Theory of Networks

Detecting and Characterizing Small Dense Bipartite-like Subgraphs by the Bipartiteness Ratio Measure

Dimensions, Structures and Security of Networks

Homophyly and Randomness Resist Cascading Failure in Networks

Homophyly Networks -- A Structural Theory of Networks

Kolmogorov complexity and computably enumerable sets

Provable Security of Networks

Algorithmic Aspects of Homophyly of Networks

The Small-Community Phenomenon in Networks