Source author record

Huijuan Wang

Huijuan Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks astro-ph.IM Machine Learning Artificial Intelligence astro-ph.HE Computation and Language cond-mat.other nlin.PS physics.data-an Populations and Evolution

Catalog footprint

What is connected

17works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EMO: Frustratingly Easy Progressive Training of Extendable MoE

Sparse Mixture-of-Experts (MoE) models offer a powerful way to scale model size without increasing compute, as per-token FLOPs depend only on k active experts rather than the total pool of E experts. Yet, this asymmetry creates an MoE efficiency paradox in practice: adding more experts balloons memory and communication costs, making actual training inefficient. We argue that this bottleneck arises in part because current MoE training allocates too many experts from the beginning, even though early-stage data may not fully utilize such capacity. Motivated by this, we propose EMO, a simple progressive training framework that treats MoE capacity as expandable memory and grows the expert pool over the course of training. EMO explicitly models sparsity in scaling law to derive stage-wise compute-optimal token budgets for progressive expansion. Empirical results show that EMO matches the performance of a fixed-expert setup in large-scale experiments while improving wall-clock efficiency. It offers a surprisingly simple yet effective path to scalable MoE training, preserving the benefits of large expert pools while reducing both training time and GPU cost.

preprint2026arXiv

GQA-μP: The maximal parameterization update for grouped query attention

Hyperparameter transfer across model architectures dramatically reduces the amount of compute necessary for tuning large language models (LLMs). The maximal update parameterization (μP) ensures transfer through principled mathematical analysis but can be challenging to derive for new model architectures. Building on the spectral feature-learning view of Yang et al. (2023a), we make two advances. First, we promote spectral norm conditions on the weights from a heuristic to the definition of feature learning, and as a consequence arrive at the Complete-P depth and weight-decay scalings without recourse to lazy-learning. Second, we consider a modified spectral norm that preserves the valid scaling law of network weights when weight matrices are not full rank. This enables (to our knowledge, the first) derivation of μP scalings for grouped-query attention (GQA). We demonstrate the efficacy of our theoretical derivations by showing learning rate transfer across the GQA repetition hyperparameter as well as experiments regarding transfer over weight decay.

preprint2022arXiv

Simple and Effective Relation-based Embedding Propagation for Knowledge Representation Learning

Relational graph neural networks have garnered particular attention to encode graph context in knowledge graphs (KGs). Although they achieved competitive performance on small KGs, how to efficiently and effectively utilize graph context for large KGs remains an open problem. To this end, we propose the Relation-based Embedding Propagation (REP) method. It is a post-processing technique to adapt pre-trained KG embeddings with graph context. As relations in KGs are directional, we model the incoming head context and the outgoing tail context separately. Accordingly, we design relational context functions with no external parameters. Besides, we use averaging to aggregate context information, making REP more computation-efficient. We theoretically prove that such designs can avoid information distortion during propagation. Extensive experiments also demonstrate that REP has significant scalability while improving or maintaining prediction quality. Notably, it averagely brings about 10% relative improvement to triplet-based embedding methods on OGBL-WikiKG2 and takes 5%-83% time to achieve comparable results as the state-of-the-art GC-OTE.

preprint2022arXiv

Topological-temporal properties of evolving networks

Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare such systems and to study the dynamical processes unfolding on them. A systematic method to characterize simultaneously the temporal and topological relations of active links (also called contacts or events), in order to compare different real-world networks and to detect their common patterns or differences is still missing. In this paper, we propose a method to characterize to what extent contacts that happen close in time occur also close in topology. Specifically, we study the interrelation between temporal and topological properties of contacts from three perspectives: (1) the autocorrelation of the time series recording the total number of contacts happened at each time step in a network; (2) the interplay between the topological distance and interevent time of two contacts; (3) the temporal correlation of contacts within local neighborhoods beyond a node pair. By applying our method on 13 real-world temporal networks, we found that temporal-topological correlation of contacts is more evident in virtual contact networks than in physical contact ones. This could be due to the lower cost and easier access of online communications than physical interactions, allowing and possibly facilitating social contagion, i.e., interactions of one individual may influence the activity of its neighbors. We also identify different patterns between virtual and physical networks and among physical contact networks at, e.g., school and workplace, in the formation of correlation in local neighborhoods. Detected patterns and differences may further inspire the development of more realistic temporal network models, that could reproduce jointly temporal and topological properties of contacts.

preprint2020arXiv

SN 2018zd: An Unusual Stellar Explosion as Part of the Diverse Type II Supernova Landscape

We present extensive observations of SN 2018zd covering the first $\sim450$\,d after the explosion. This SN shows a possible shock-breakout signal $\sim3.6$\,hr after the explosion in the unfiltered light curve, and prominent flash-ionisation spectral features within the first week. The unusual photospheric temperature rise (rapidly from $\sim 12,000$\,K to above 18,000\,K) within the earliest few days suggests that the ejecta were continuously heated. Both the significant temperature rise and the flash spectral features can be explained with the interaction of the SN ejecta with the massive stellar wind ($0.18^{+0.05}_{-0.10}\, \rm M_{\odot}$), which accounts for the luminous peak ($L_{\rm max} = [1.36\pm 0.63] \times 10^{43}\, \rm erg\,s^{-1}$) of SN 2018zd. The luminous peak and low expansion velocity ($v \approx 3300$ km s$^{-1}$) make SN 2018zd to be like a member of the LLEV (luminous SNe II with low expansion velocities) events originated due to circumstellar interaction. The relatively fast post-peak decline allows a classification of SN 2018zd as a transition event morphologically linking SNe~IIP and SNe~IIL. In the radioactive-decay phase, SN 2018zd experienced a significant flux drop and behaved more like a low-luminosity SN~IIP both spectroscopically and photometrically. This contrast indicates that circumstellar interaction plays a vital role in modifying the observed light curves of SNe~II. Comparing nebular-phase spectra with model predictions suggests that SN 2018zd arose from a star of $\sim 12\,\rm M_{\odot}$. Given the relatively small amount of $^{56}$Ni ($0.013 - 0.035 \rm M_{\odot}$), the massive stellar wind, and the faint X-ray radiation, the progenitor of SN 2018zd could be a massive asymptotic giant branch star which collapsed owing to electron capture.

preprint2020arXiv

The backbone-residual model. Accurately characterising the instrumental profile of a fibre-fed echelle spectrograph

Context: Instrumental profile (IP) is the basic property of a spectrograph. Accurate IP characterisation is the prerequisite of accurate wavelength solution. It also facilitates new spectral acquisition methods such as the forward modeling and deconvolution. Aims: We investigate an IP modeling method for the fibre-fed echelle spectrograph with the emission lines of the ThAr lamp, and explore the method to evaluate the accuracy of IP characterisation. Methods: The backbone-residual (BR) model is put forward and tested on the fibre-fed High Resolution Spectrograph (HRS) at the Chinese Xinglong 2.16-m Telescope, which is the sum of the backbone function and the residual function. The backbone function is a bell-shaped function to describe the main component and the spatial variation of IP. The residual function, which is expressed as the cubic spline function, accounts for the difference between the bell-shaped function and the actual IP. The method of evaluating the accuracy of IP characterisation is based on the spectral reconstruction and Monte Carlo simulation. Results: The IP of HRS is characterised with the BR model, and the accuracy of the characterised IP reaches 0.006 of the peak value of the backbone function. This result demonstrates that the accurate IP characterisation has been achieved on HRS with the BR model, and the BR model is an excellent choice for accurate IP characterisation of fibre-fed echelle spectrographs.

preprint2016arXiv

SIS Epidemic Spreading with Heterogeneous Infection Rates

In this work, we aim to understand the influence of the heterogeneity of infection rates on the Susceptible-Infected-Susceptible (SIS) epidemic spreading. Employing the classic SIS model as the benchmark, we study the influence of the independently identically distributed infection rates on the average fraction of infected nodes in the metastable state. The log-normal, gamma and a newly designed distributions are considered for infection rates. We find that, when the recovery rate is small, i.e.\ the epidemic spreads out in both homogeneous and heterogeneous cases: 1) the heterogeneity of infection rates on average retards the virus spreading, and 2) a larger even-order moment of the infection rates leads to a smaller average fraction of infected nodes, but the odd-order moments contribute in the opposite way; when the recovery rate is large, i.e.\ the epidemic may die out or infect a small fraction of the population, the heterogeneity of infection rates may enhance the probability that the epidemic spreads out. Finally, we verify our conclusions via real-world networks with their heterogeneous infection rates. Our results suggest that, in reality the epidemic spread may not be so severe as the classic SIS model indicates, but to eliminate the epidemic is probably more difficult.

preprint2016arXiv

The Accuracy of Mean-Field Approximation for Susceptible-Infected-Susceptible Epidemic Spreading

The epidemic spreading has been studied for years by applying the mean-field approach in both homogeneous case, where each node may get infected by an infected neighbor with the same rate, and heterogeneous case, where the infection rates between different pairs of nodes are different. Researchers have discussed whether the mean-field approaches could accurately describe the epidemic spreading for the homogeneous cases but not for the heterogeneous cases. In this paper, we explore under what conditions the mean-field approach could perform well when the infection rates are heterogeneous. In particular, we employ the Susceptible-Infected-Susceptible (SIS) model and compare the average fraction of infected nodes in the metastable state obtained by the continuous-time simulation and the mean-field approximation. We concentrate on an individual-based mean-field approximation called the N-intertwined Mean Field Approximation (NIMFA), which is an advanced approach considered the underlying network topology. Moreover, we consider not only the independent and identically distributed (i.i.d.) infection rate but also the infection rate correlated with the degree of the two end nodes. We conclude that NIMFA is generally more accurate when the prevalence of the epidemic is higher. Given the same effective infection rate, NIMFA is less accurate when the variance of the i.i.d.\ infection rate or the correlation between the infection rate and the nodal degree leads to a lower prevalence. Moreover, given the same actual prevalence, NIMFA performs better in the cases: 1) when the variance of the i.i.d.\ infection rates is smaller (while the average is unchanged); 2) when the correlation between the infection rate and the nodal degree is positive.

preprint2016arXiv

The Xinglong 2.16-m Telescope: Current Instruments and Scientific Projects

The Xinglong 2.16-m reflector is the first 2-meter class astronomical telescope in China. It was jointly designed and built by the Nanjing Astronomical Instruments Factory (NAIF), Beijing Astronomical Observatory (now National Astronomical Observatories, Chinese Academy of Sciences, NAOC) and Institute of Automation, Chinese Academy of Sciences in 1989. It is Ritchey-Chrétien (R-C) reflector on an English equatorial mount and the effective aperture is 2.16 meters. It had been the largest optical telescope in China for $\sim18$ years until the Guoshoujing Telescope (also called Large Sky Area Multi-Object Fiber Spectroscopic Telescope, LAMOST) and the Lijiang 2.4-m telescope were built. At present, there are three main instruments on the Cassegrain focus available: the Beijing Faint Object Spectrograph and Camera (BFOSC) for direct imaging and low resolution ($R\sim500-2000$) spectroscopy, the spectrograph made by Optomechanics Research Inc. (OMR) for low resolution spectroscopy (the spectral resolutions are similar to those of BFOSC) and the fiber-fed High Resolution Spectrograph (HRS, $R\sim30000-65000$). The telescope is widely open to astronomers all over China as well as international astronomical observers. Each year there are more than 40 ongoing observing projects, including 6-8 key projects. Recently, some new techniques and instruments (e.g., astro-frequency comb calibration system, polarimeter and adaptive optics) have been or will be tested on the telescope to extend its observing abilities.

preprint2014arXiv

Correlation between centrality metrics and their application to the opinion model

In recent decades, a number of centrality metrics describing network properties of nodes have been proposed to rank the importance of nodes. In order to understand the correlations between centrality metrics and to approximate a high-complexity centrality metric by a strongly correlated low-complexity metric, we first study the correlation between centrality metrics in terms of their Pearson correlation coefficient and their similarity in ranking of nodes. In addition to considering the widely used centrality metrics, we introduce a new centrality measure, the degree mass. The m order degree mass of a node is the sum of the weighted degree of the node and its neighbors no further than m hops away. We find that the B_{n}, the closeness, and the components of x_{1} are strongly correlated with the degree, the 1st-order degree mass and the 2nd-order degree mass, respectively, in both network models and real-world networks. We then theoretically prove that the Pearson correlation coefficient between x_{1} and the 2nd-order degree mass is larger than that between x_{1} and a lower order degree mass. Finally, we investigate the effect of the inflexible antagonists selected based on different centrality metrics in helping one opinion to compete with another in the inflexible antagonists opinion model. Interestingly, we find that selecting the inflexible antagonists based on the leverage, the B_{n}, or the degree is more effective in opinion-competition than using other centrality metrics in all types of networks. This observation is supported by our previous observations, i.e., that there is a strong linear correlation between the degree and the B_{n}, as well as a high centrality similarity between the leverage and the degree.

preprint2014arXiv

Heterogeneous Recovery Rates against SIS Epidemics in Directed Networks

The nodes in communication networks are possibly and most likely equipped with different recovery resources, which allow them to recover from a virus with different rates. In this paper, we aim to understand know how to allocate the limited recovery resources to efficiently prevent the spreading of epidemics. We study the susceptible-infected-susceptible (SIS) epidemic model on directed scale-free networks. In the classic SIS model, a susceptible node can be infected by an infected neighbor with the infection rate $β$ and an infected node can be recovered to be susceptible again with the recovery rate $δ$. In the steady state a fraction $y_\infty$ of nodes are infected, which shows how severely the network is infected. We propose to allocate the recovery rate $δ_i$ for node $i$ according to its indegree and outdegree-$δ_i\scriptsize{\sim}k_{i,in}^{α_{in}}k_{i,out}^{α_{out}}$, given the finite average recovery rate $\langleδ\rangle$ representing the limited recovery resources over the whole network. We find that, by tuning the two scaling exponents $α_{in}$ and $α_{out}$, we can always reduce the infection fraction $y_\infty$ thus reducing the extent of infections, comparing to the homogeneous recovery rates allocation. Moreover, we can find our optimal strategy via the optimal choice of the exponent $α_{in}$ and $α_{out}$. Our optimal strategy indicates that when the recovery resources are sufficient, more resources should be allocated to the nodes with a larger indegree or outdegree, but when the recovery resource is very limited, only the nodes with a larger outdegree should be equipped with more resources. We also find that our optimal strategy works better when the recovery resources are sufficient but not yet able to make the epidemic die out, and when the indegree outdegree correlation is small.

preprint2014arXiv

Non-consensus opinion model on directed networks

Dynamic social opinion models have been widely studied on undirected networks, and most of them are based on spin interaction models that produce a consensus. In reality, however, many networks such as Twitter and the World Wide Web are directed and are composed of both unidirectional and bidirectional links. Moreover, from choosing a coffee brand to deciding who to vote for in an election, two or more competing opinions often coexist. In response to this ubiquity of directed networks and the coexistence of two or more opinions in decision-making situations, we study a non-consensus opinion model introduced by Shao et al. \cite{shao2009dynamic} on directed networks. We define directionality $ξ$ as the percentage of unidirectional links in a network, and we use the linear correlation coefficient $ρ$ between the indegree and outdegree of a node to quantify the relation between the indegree and outdegree. We introduce two degree-preserving rewiring approaches which allow us to construct directed networks that can have a broad range of possible combinations of directionality $ξ$ and linear correlation coefficient $ρ$ and to study how $ξ$ and $ρ$ impact opinion competitions. We find that, as the directionality $ξ$ or the indegree and outdegree correlation $ρ$ increases, the majority opinion becomes more dominant and the minority opinion's ability to survive is lowered.

preprint2013arXiv

Effect of the Interconnected Network Structure on the Epidemic Threshold

Most real-world networks are not isolated. In order to function fully, they are interconnected with other networks, and this interconnection influences their dynamic processes. For example, when the spread of a disease involves two species, the dynamics of the spread within each species (the contact network) differs from that of the spread between the two species (the interconnected network). We model two generic interconnected networks using two adjacency matrices, A and B, in which A is a 2N*2N matrix that depicts the connectivity within each of two networks of size N, and B a 2N*2N matrix that depicts the interconnections between the two. Using an N-intertwined mean-field approximation, we determine that a critical susceptable-infected-susceptable (SIS) epidemic threshold in two interconnected networks is 1/λ1(A+αB), where the infection rate is βwithin each of the two individual networks and αβin the interconnected links between the two networks and λ1(A+αB) is the largest eigenvalue of the matrix A+αB. In order to determine how the epidemic threshold is dependent upon the structure of interconnected networks, we analytically derive λ1(A+αB) using perturbation approximation for small and large α, the lower and upper bound for any αas a function of the adjacency matrix of the two individual networks, and the interconnections between the two and their largest eigenvalues/eigenvectors. We verify these approximation and boundary values for λ1(A+αB) using numerical simulations, and determine how component network features affect λ1(A+αB).

preprint2013arXiv

Epidemic threshold in directed networks

Epidemics have so far been mostly studied in undirected networks. However, many real-world networks, such as the social network Twitter and the WWW networks, upon which information, emotion or malware spreads, are shown to be directed networks, composed of both unidirectional links and bidirectional links. We define the directionality as the percentage of unidirectional links. The epidemic threshold for the susceptible-infected-susceptible (SIS) epidemic has been proved to be 1/lambda_{1} in directed networks by N-intertwined Mean-field Approximation, where lambda_{1}, also called as spectral radius, is the largest eigenvalue of the adjacency matrix. Here, we propose two algorithms to generate directed networks with a given degree distribution, where the directionality can be controlled. The effect of directionality on the spectral radius lambda_{1}, principal eigenvector x_{1}, spectral gap lambda_{1}-|lambda_{2}|) and algebraic connectivity |mu_{N-1}| is studied. Important findings are that the spectral radius lambda_{1} decreases with the directionality, and the spectral gap and the algebraic connectivity increase with the directionality. The extent of the decrease of the spectral radius depends on both the degree distribution and the degree-degree correlation rho_{D}. Hence, the epidemic threshold of directed networks is larger than that of undirected networks, and a random walk converges to its steady-state faster in directed networks than in undirected networks with degree distribution.

preprint2012arXiv

Non-consensus opinion models on complex networks

We focus on non-consensus opinion models in which above a certain threshold two opinions coexist in a stable relationship. We revisit and extend the non-consensus opinion (NCO) model introduced by Shao. We generalize the NCO model by adding a weight factor W to individual's own opinion when determining its future opinion (NCOW model). We find that as W increases the minority opinion holders tend to form stable clusters with a smaller initial minority fraction compared to the NCO model. We also revisit another non-consensus opinion, the inflexible contrarian opinion (ICO) model, which introduces inflexible contrarians to model a competition between two opinions in the steady state. In the ICO model, the inflexible contrarians effectively decrease the size of the largest cluster of the rival opinion. All of the above models have previously been explored in terms of a single network. However opinions propagate not only within single networks but also between networks, we study here the opinion dynamics in coupled networks. We apply the NCO rule on each individual network and the global majority rule on interdependent pairs. We find that the interdependent links effectively force the system from a second order phase transition, which is characteristic of the NCO model on a single network, to a hybrid phase transition, i.e., a mix of second-order and abrupt jump-like transitions that ultimately becomes, as we increase the percentage of interdependent agents, a pure abrupt transition. We conclude that for the NCO model on coupled networks, interactions through interdependent links could push the non-consensus opinion type model to a consensus opinion type model, which mimics the reality that increased mass communication causes people to hold opinions that are increasingly similar.

preprint2012arXiv

The robustness of interdependent clustered networks

It was recently found that cascading failures can cause the abrupt breakdown of a system of interdependent networks. Using the percolation method developed for single clustered networks by Newman [Phys. Rev. Lett. {\bf 103}, 058701 (2009)], we develop an analytical method for studying how clustering within the networks of a system of interdependent networks affects the system's robustness. We find that clustering significantly increases the vulnerability of the system, which is represented by the increased value of the percolation threshold $p_c$ in interdependent networks.

preprint2011arXiv

Competition of spatial and temporal instabilities under time delay near codimension-two Turing-Hopf bifurcations

Competition of spatial and temporal instabilities under time delay near the codimension-two Turing-Hopf bifurcations is studied in a reaction-diffusion equation. The time delay changes remarkably the oscillation frequency, the intrinsic wave vector, and the intensities of both Turing and Hopf modes. The application of appropriate time delay can control the competition between the Turing and Hopf modes. Analysis shows that individual or both feedbacks can realize the control of the transformation between the Turing and Hopf patterns. Two dimensional numerical simulations validate the analytical results.

Huijuan Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

EMO: Frustratingly Easy Progressive Training of Extendable MoE

GQA-μP: The maximal parameterization update for grouped query attention

Simple and Effective Relation-based Embedding Propagation for Knowledge Representation Learning

Topological-temporal properties of evolving networks

SN 2018zd: An Unusual Stellar Explosion as Part of the Diverse Type II Supernova Landscape

The backbone-residual model. Accurately characterising the instrumental profile of a fibre-fed echelle spectrograph

SIS Epidemic Spreading with Heterogeneous Infection Rates

The Accuracy of Mean-Field Approximation for Susceptible-Infected-Susceptible Epidemic Spreading

The Xinglong 2.16-m Telescope: Current Instruments and Scientific Projects

Correlation between centrality metrics and their application to the opinion model

Heterogeneous Recovery Rates against SIS Epidemics in Directed Networks

Non-consensus opinion model on directed networks

Effect of the Interconnected Network Structure on the Epidemic Threshold

Epidemic threshold in directed networks

Non-consensus opinion models on complex networks

The robustness of interdependent clustered networks

Competition of spatial and temporal instabilities under time delay near codimension-two Turing-Hopf bifurcations