Researcher profile

Lu Yin

Lu Yin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

Large Language Models (LLMs) have achieved remarkable capabilities, but their immense computational demands during training remain a critical bottleneck for widespread adoption. Low-rank training has received attention in recent years due to its ability to significantly reduce training memory usage. Meanwhile, applying 2:4 structured sparsity to weights and activations to leverage NVIDIA GPU support for 2:4 structured sparse format has become a promising direction. However, existing low-rank methods often leave activation matrices in full-rank, which dominates memory consumption and limits throughput during large-batch training. Furthermore, directly applying sparsity to weights often leads to non-negligible performance degradation. To achieve efficient pre-training of LLMs, this paper proposes ELAS: Efficient pre-training of Low-rank LLMs via 2:4 Activation Sparsity, a novel framework for low-rank models via 2:4 activation sparsity. ELAS applies squared ReLU activation functions to the feed-forward networks in low-rank models and implements 2:4 structured sparsity on the activations after the squared ReLU operation. We evaluated ELAS through pre-training experiments on LLaMA models ranging from 60M to 1B parameters. The results demonstrate that ELAS maintains performance with minimal degradation after applying 2:4 activation sparsity, while achieving training and inference acceleration. Moreover, ELAS reduces activation memory overhead, particularly with large batch sizes. Code is available at ELAS Repo.

preprint2022arXiv

Gravitational waves from the vacuum decay with LISA

We investigate the gravitational wave spectrum resulted from the cosmological first-order phase transition. We compare two models; one is a scalar field model without gravitation, while the other is a scalar field model with gravitation. Based on the sensitivity curves of the LISA space-based interferometer on the stochastic gravitational-wave background, we compare the difference between the gravitational wave spectra of the former and the latter cases resulted from the bubble collision process. Especially, we calculated the speed of the bubble wall before collision for the two models numerically. We show that the difference between the amplitudes of those spectra can clearly distinguish between the two models. We expect that the LISA with Signal to Noise Ratio =10 could observe the spectrum as the fast first-order phase transition.

preprint2022arXiv

Hints of FLRW Breakdown from Supernovae

A 10\% difference in the scale for the Hubble parameter constitutes a clear problem for cosmology. Here, considering angular distribution of Type Ia supernovae (SN) within the Pantheon compilation and working within flat $Λ$CDM cosmology, we observe a correlation between higher $H_0$ and the CMB dipole direction, confirming our previous results for strongly-lensed quasars \cite{Krishnan:2021dyb}. Concretely, we record a $\sim 1$ km/s/Mpc variation in $H_0$ at antipodal points on the sky within the Pantheon sample, which is evident in the Low $z$ subsample ($z \lesssim 0.075$) and gets enhanced by higher redshift SN. Our work raises the possibility that we may be at the precision required to probe anisotropic Hubble expansions, while providing a concrete prediction for future inferences of $H_0$.

preprint2022arXiv

On Larger $H_0$ Values in the CMB Dipole Direction

On the assumption that quasars (QSO) and gamma-ray bursts (GRB) represent \textit{standardisable candles}, we provide evidence that the Hubble constant $H_0$ adopts larger values in hemispheres aligned with the CMB dipole direction. If substantiated, this trend signals a departure from FLRW cosmology. In particular, QSOs show a definite trend, whereas our findings in GRBs are consistent with an isotropic Universe, but we show in a sample of GRBs calibrated with Type Ia supernovae (SN) that this conclusion may change as one focuses on GRBs more closely (mis)aligned with the CMB dipole direction. The statistical significance in QSOs alone is $\gtrsim 2 σ$ and when combined with similar trends in strong lensing, Type Ia SN and calibrated GRBs, this increases to $\sim 3 σ$. Our findings are consistent with reported discrepancies in the cosmic dipole and anisotropies in galaxy cluster scaling relations. The reported variations in $H_0$ across the sky suggest that Hubble tension may be a symptom of a deeper cosmological malaise.

preprint2022arXiv

Reducing the $H_0$ Tension with Exponential Acoustic Dark Energy

The Hubble tension arises from different observations between the late-time and early Universe. We explore a new model with dark fluid, called the exponential Acoustic Dark Energy (eADE) model, to relieve the Hubble tension. The eADE model gives an exponential form of the equation of state (EoS) in the acoustic dark energy, which is the first time to explore an exponential form for the EoS. In this model, the gravitational effects from the acoustic oscillations of the model can impact the CMB phenomena at the matter radiation equally epoch. We give the constraints of the eADE model by the current cosmological dataset. The comparison of the phenomena with the standard model can be shown through CMB and matter power spectra. The fitting results of our model have $H_0 = 70.06^{+1.13}_{-1.09}$ in 95$\%$ C.L. and a smaller best-fit value than $Λ$CDM.

preprint2022arXiv

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing

Few-shot classification tasks aim to classify images in query sets based on only a few labeled examples in support sets. Most studies usually assume that each image in a task has a single and unique class association. Under these assumptions, these algorithms may not be able to identify the proper class assignment when there is no exact matching between support and query classes. For example, given a few images of lions, bikes, and apples to classify a tiger. However, in a more general setting, we could consider the higher-level concept, the large carnivores, to match the tiger to the lion for semantic classification. Existing studies rarely considered this situation due to the incompatibility of label-based supervision with complex conception relationships. In this work, we advance the few-shot learning towards this more challenging scenario, the semantic-based few-shot learning, and propose a method to address the paradigm by capturing the inner semantic relationships using interactive psychometric learning. The experiment results on the CIFAR-100 dataset show the superiority of our method for the semantic-based few-shot learning compared to the baseline.

preprint2022arXiv

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization). The former method suffers from an extremely large computation cost and the latter usually struggles with insufficient performance. In comparison, during-training pruning, a class of pruning methods that simultaneously enjoys the training/inference efficiency and the comparable performance, temporarily, has been less explored. To better understand during-training pruning, we quantitatively study the effect of pruning throughout training from the perspective of pruning plasticity (the ability of the pruned networks to recover the original performance). Pruning plasticity can help explain several other empirical observations about neural network pruning in literature. We further find that pruning plasticity can be substantially improved by injecting a brain-inspired mechanism called neuroregeneration, i.e., to regenerate the same number of connections as pruned. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (\textbf{GraNet}), that advances state of the art. Perhaps most impressively, its sparse-to-sparse version for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods with ResNet-50 on ImageNet without extending the training time. We release all codes in https://github.com/Shiweiliuiiiiiii/GraNet.

preprint2022arXiv

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch. Existing sparse training methods usually strive to find the best sparse subnetwork possible in one single run, without involving any expensive dense or pre-training steps. For instance, dynamic sparse training (DST), is capable of reaching a competitive performance of dense training by iteratively evolving the sparse topology during the course of training. In this paper, we argue that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork. To achieve this, two desiderata are required: (1) efficiently producing many low-loss subnetworks, the so-called cheap tickets, within one training process limited to the standard training time used in dense training; (2) effectively superposing these cheap tickets into one stronger subnetwork. To corroborate our conjecture, we present a novel sparse training approach, termed Sup-tickets, which can satisfy the above two desiderata concurrently in a single sparse-to-sparse training process. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods and demonstrates consistent performance improvement.

preprint2020arXiv

Constraints on a special running vacuum model

We study a special running vacuum model (RVM) with $Λ= 3 αH^2+3βH_0^4 H^{-2}+Λ_0$, where $α$, $β$ and $Λ_0$ are the model parameters and $H$ is the Hubble one. This RVM has non-analytic background solutions for the energy densities of matter and radiation, which can only be evaluated numerically. From the analysis of the CMB power spectrum and baryon acoustic oscillation along with the prior of $α>0$ to avoid having a negative dark energy density, we find that $α<2.83\times 10^{-4}$ and $β=(-0.2^{+3.9}_{-4.5})\times 10^{-4}$ (95$\%$ C.L.). We show that the RVM fits the cosmological data comparably to the $Λ$CDM. In addition, we relate the fluctuation amplitude $σ_8$ to the neutrino mass sum $Σm_ν$.

preprint2020arXiv

DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation

Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.

preprint2020arXiv

Modified Cosmology Models from Thermodynamical Approach

We apply the first law of thermodynamics to the apparent horizon of the universe with the power-law corrected and non-extensive Tsallis entropies rather than the Bekenstein-Hawking one. We examine the cosmological properties in the two entropy models by using the CosmoMC package. In particular, the first numerical study for the cosmological observables with the power-law corrected entropy is performed. We also show that the neutrino mass sum has a non-zero central value with a relaxed upper bound in the Tsallis entropy model comparing with that in the $Λ$CDM one.

preprint2020arXiv

Multicomponent Dark Matter in the Light of CALET and DAMPE

In the light of the latest measurements on the total $e^+ + e^-$ flux by CALET and DAMPE experiments, we revisit the multicomponent leptonically decaying dark matter (DM) explanations to the cosmic-ray electron/positron excesses observed previously. Especially, we use the single and double-component DM models to explore the compatibility of the AMS-02 positron fraction with the new CALET or DAMPE data. It turns out that neither single nor double-component DM models are able to fit the AMS-02 positron fraction and DAMPE total $e^+ + e^-$ flux data simultaneously. On the other hand, for the combined AMS-02 and CALET dataset, both the single and double-component DM models can provide reasonable fits. If we further take into the diffuse $γ$-ray constraints from Fermi-LAT, only the double-component DM models are allowed.

preprint2020arXiv

Reducing the $H_{0}$ tension with generalized Proca theory

We investigate the cosmological viability of the generalized proca theory. We first implement the background and linear perturbation equations of motion in the Boltzmann code and then study the constraints on the parameters of the generalized proca theory after running MCMC against the cosmological data set. With Planck + HST data, we obtain the constraint $h=0.7334_{-0.0269}^{+0.0246}$, which indicates that the tension between early universe and late time universe within this theory is removed. By adding other late-time data sets (BAO, RSD, etc.) we show that the tension is reduced, as the 2$σ$ allowed region for $h$ in Proca, $h=0.7041_{-0.0087}^{+0.0094}$, overlaps with the 2$σ$ region of the HST data.