Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
43works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

43 published item(s)

preprint2026arXiv

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

To stabilize the training of Large Language Models (LLMs), gradient clipping is a nearly ubiquitous heuristic used to alleviate exploding gradients. However, traditional global norm clipping erroneously presupposes gradient homogeneity across different functional modules, leading to an adverse "spill-over" effect where volatile parameters force unnecessary scaling on stable ones. To overcome this, we propose Adaptive Group-wise Gradient Clipping (AGGC). AGGC partitions parameters into groups based on functional types and regulates each according to its historical behavior using an Exponential Moving Average (EMA). Specifically, it constructs an adaptive interval to simultaneously mitigate gradient explosion and vanishing, while employing a time-dependent scheduling mechanism to balance exploration and convergence. Experiments on LLaMA 2-7B, Mistral-7B, and Gemma-7B models show that AGGC consistently outperforms LoRA and frequently surpasses Full Fine-Tuning. On the GSM8K benchmark, Mistral-7B fine-tuned with AGGC achieves an accuracy of 72.93%, exceeding LoRA's 69.5%. AGGC also effectively stabilizes Reinforcement Learning with Verifiable Rewards (RLVR), enhancing the logic deduction of Qwen 2.5 and Llama 3.2 models. Experimental results demonstrate that AGGC effectively addresses the limitations of traditional gradient clipping methods, particularly in overcoming gradient heterogeneity, by utilizing a modular, adaptive clipping strategy to stabilize the training process. Due to its lightweight design, AGGC can be seamlessly integrated into existing post-training pipelines with negligible overhead.

preprint2026arXiv

GPS-Synchronized Monitoring of Core-collapse Supernova Bursts with PandaX-4T via Coherent Elastic Neutrino Nuclear Scattering

The landmark detection of neutrinos from SN1987A marked the dawn of neutrino astrophysics. The neutrino burst provided essential insights into fundamental properties of neutrinos, and served as key probes of stellar evolution and supernova dynamics. The recent advancement in coherent elastic neutrino-nucleus scattering enables the detection of core-collapse supernova burst neutrinos using tonne-scale liquid xenon detectors originally designed for dark matter direct detection. Leveraging this capability, we developed and deployed an online supernova monitoring system for the PandaX-4T experiment. This system features a GPS module with millisecond-level timing precision, a low false-alarm rate, and high sensitivity to galactic core-collapse supernova explosion events. The methodology is robust, directly scalable, and planned for implementation in the next-generation PandaX-20T experiment.

preprint2026arXiv

LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries

Personalization today is fundamentally platform-centric: services build user representations from the behavioral fragments they observe. Yet no platform can construct a complete picture of the user, as competitive incentives, legal constraints, user privacy concerns, and epistemic limits create persistent data barriers. This paper argues for a shift from platform-centric personalization to user-governed personalization, where only the user can integrate fragmented contexts across platforms and the offline world. The key asymmetry lies in data access: only users can aggregate their own cross-platform and offline information. Large language model (LLM) agents make such integration practically feasible for the first time by enabling reasoning over heterogeneous personal data and transforming users' cross-context information into actionable personalization capabilities. We provide proof-of-concept evidence that users equipped with cross-platform data exports and an off-the-shelf LLM agent can outperform single-platform personalization baselines. We conclude by outlining a research agenda for building scalable user-governed personalization systems.

preprint2026arXiv

Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence

The de facto approach in video object-centric learning maintains temporal consistency through learned dynamics modules that predict future object representations, called slots. We demonstrate that these predictors function as expensive approximations of discrete correspondence problems. Modern self-supervised vision backbones already encode instance-discriminative features that distinguish objects reliably. Exploiting these features eliminates the need for learned temporal prediction. We introduce Grounded Correspondence, a framework that replaces learned transition functions with deterministic bipartite matching. Slots initialize from salient regions in frozen backbone features. Frame-to-frame identity is maintained through Hungarian matching on slot representations. The approach requires zero learnable parameters for temporal modeling yet achieves competitive performance on MOVi-D, MOVi-E, and YouTube-VIS. Project page: https://magenta-sherbet-85b101.netlify.app/

preprint2026arXiv

SimFuzz: Similarity-guided Block-level Mutation for RISC-V Processor Fuzzing

The Instruction Set Architecture (ISA) defines processor operations and serves as the interface between hardware and software. As an open ISA, RISC-V lowers the barriers to processor design and encourages widespread adoption, but also exposes processors to security risks such as functional bugs. Processor fuzzing is a powerful technique for automatically detecting these bugs. However, existing fuzzing methods suffer from two main limitations. First, their emphasis on redundant test case generation causes them to overlook cross-processor corner cases. Second, they rely too heavily on coverage guidance. Current coverage metrics are biased and inefficient, and become ineffective once coverage growth plateaus. To overcome these limitations, we propose SimFuzz, a fuzzing framework that constructs a high-quality seed corpus from historical bug-triggering inputs and employs similarity-guided, block-level mutation to efficiently explore the processor input space. By introducing instruction similarity, SimFuzz expands the input space around seeds while preserving control-flow structure, enabling deeper exploration without relying on coverage feedback. We evaluate SimFuzz on three widely used open-source RISC-V processors: Rocket, BOOM, and XiangShan, and discover 17 bugs in total, including 14 previously unknown issues, 7 of which have been assigned CVE identifiers. These bugs affect the decode and memory units, cause instruction and data errors, and can lead to kernel instability or system crashes. Experimental results show that SimFuzz achieves up to 73.22% multiplexer coverage on the high-quality seed corpus. Our findings highlight critical security bugs in mainstream RISC-V processors and offer actionable insights for improving functional verification.

preprint2026arXiv

Wind-fed Supermassive Black Hole Accretion in the Ultracompact Dwarf Galaxy M60-UCD1

Ultracompact dwarf galaxies (UCDs) are thought to be remnants of stripped galactic nuclei, among which a handful are known to host a central supermassive black hole (SMBH). As in stripped nuclear star clusters, the SMBHs in UCDs may be fed by stellar winds from old stellar populations, in the absence of substantial gas reservoirs and galactic inflows. In this work, we investigate such a wind-fed accretion scenario for M60-UCD1, which harbors a confirmed $2\times10^7~M_\odot$ SMBH and exhibits X-ray emission suggestive of SMBH accretion signature. Using three-dimensional hydrodynamical simulations, we simulate the SMBH accreting stellar winds from approximately 1500 asymptotic giant branch stars, and explore the role of ram pressure from the ambient interstellar or intracluster medium. After 5 Myr, the majority of the stellar winds form a cold gas disk ($\sim1000~M_\odot$) within $\sim10~\rm pc$ as well as the SMBH's gravitational sphere of influence. Within the inner $10^4~r_{\rm g}$, this disk transitions into a hot ($\sim10^7-10^9~\rm K$), geometrically thick corona that dominates the X-ray emission. The SMBH achieves an accretion rate of $\sim10^{-5}~M_\odot~\rm yr^{-1}$, yielding an X-ray luminosity of $\sim7\times10^{37}~\rm erg~s^{-1}$, well consistent with observations. Including ram pressure stripping reduces both the accretion rate and luminosity by about a factor of two. Our results suggest that the X-ray counterpart of M60-UCD1 originates from a weakly accreting SMBH fed by stellar winds, with broader insights into the feeding mechanisms of central massive black holes and the origins of X-ray sources in other UCDs.

preprint2025arXiv

Radio signatures of AGN-wind-driven shocks in elliptical galaxies: From simulations to observations

We investigate the synchrotron emission signatures of shocks driven by active galactic nucleus (AGN) wind in elliptical galaxies based on our two-dimensional axisymmetric hydrodynamic MACER numerical simulations. Using these simulation data, we calculate the synchrotron radiation produced by nonthermal electrons accelerated at shocks, adopting reasonable assumptions for the magnetic field and relativistic electron distribution (derived from diffusive shock acceleration theory), and predict the resulting observational signatures. In our fiducial model, shocks driven by AGN winds produce synchrotron emission with luminosities of approximately $10^{29}\,\mathrm{erg\,s^{-1}\,Hz^{-1}}$ in the radio band (0.5-5 GHz), with spectral indices of $α\approx -0.4$ to $-0.6$ during the strongest shock phases, gradually steepening to about $-0.8$ to $-1.4$ as the electron population ages. Spatially, the emission is initially concentrated in regions of strong shocks, later expanding into more extended, diffuse structures. We also apply our model to the dwarf elliptical galaxy Messier 32 (M32), and find remarkable consistency between our simulated emission and the observed nuclear radio source, suggesting that this radio component likely originates from hot-wind-driven shocks. Our results indicate that AGN winds not only influence galaxy gas dynamics through mechanical energy input but also yield direct observational evidence via nonthermal radiation. With the advent of next-generation radio facilities such as the FAST Core Array, SKA, and ngVLA, these emission signatures serve as important probes for detecting and characterizing AGN feedback.

preprint2023arXiv

How Does Sharpness-Aware Minimization Minimize Sharpness?

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.

preprint2023arXiv

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held belief that flatter minima lead to better generalization, this paper gives mathematical analysis and supporting experiments suggesting that normalization (together with accompanying weight-decay) encourages GD to reduce the sharpness of loss surface. Here "sharpness" is carefully defined given that the loss is scale-invariant, a known consequence of normalization. Specifically, for a fairly broad class of neural nets with normalization, our theory explains how GD with a finite learning rate enters the so-called Edge of Stability (EoS) regime, and characterizes the trajectory of GD in this regime via a continuous sharpness-reduction flow.

preprint2022arXiv

A Chandra Survey of Milky Way Globular Clusters. III. Searching for X-ray Signature of Intermediate-mass Black Holes

Globular clusters (GCs) are thought to harbor the long-sought population of intermediate-mass black holes (IMBHs). We present a systematic search for a putative IMBH in 81 Milky Way GCs, based on archival Chandra X-ray observations. We find in only six GCs a significant X-ray source positionally coincident with the cluster center, which have 0.5-8 keV luminosities between $\sim1\times 10^{30}~{\rm erg~s^{-1}}$ to $\sim 4\times10^{33}~{\rm erg~s^{-1}}$. However, the spectral and temporal properties of these six sources can also be explained in terms of binary stars. The remaining 75 GCs do not have a detectable central source, most with $3σ$ upper limits ranging between $10^{29-32}~{\rm erg~s^{-1}}$ over 0.5-8 keV, which are significantly lower than predicted for canonical Bondi accretion. To help understand the feeble X-ray signature, we perform hydrodynamic simulations of stellar wind accretion onto a $1000~{\rm M_\odot}$ IMBH from the most-bound orbiting star, for stellar wind properties consistent with either a main-sequence (MS) star or an asymptotic giant branch (AGB) star. We find that the synthetic X-ray luminosity for the MS case ($\sim 10^{19}\rm~erg~s^{-1}$) is far below the current X-ray limits. The predicted X-ray luminosity for the AGB case ($\sim 10^{34}\rm~erg~s^{-1}$), on the other hand, is compatible with the detected central X-ray sources, in particular the ones in Terzan 5 and NGC 6652. However, the probability of having an AGB star as the most-bound star around the putative IMBH is very low. Our study strongly suggests that it is very challenging to detect the accretion-induced X-ray emission from IMBHs, even if they were prevalent in present-day GCs.

preprint2022arXiv

A Novel Ontology-guided Attribute Partitioning Ensemble Learning Model for Early Prediction of Cognitive Deficits using Quantitative Structural MRI in Very Preterm Infants

Structural magnetic resonance imaging studies have shown that brain anatomical abnormalities are associated with cognitive deficits in preterm infants. Brain maturation and geometric features can be used with machine learning models for predicting later neurodevelopmental deficits. However, traditional machine learning models would suffer from a large feature-to-instance ratio (i.e., a large number of features but a small number of instances/samples). Ensemble learning is a paradigm that strategically generates and integrates a library of machine learning classifiers and has been successfully used on a wide variety of predictive modeling problems to boost model performance. Attribute (i.e., feature) bagging method is the most commonly used feature partitioning scheme, which randomly and repeatedly draws feature subsets from the entire feature set. Although attribute bagging method can effectively reduce feature dimensionality to handle the large feature-to-instance ratio, it lacks consideration of domain knowledge and latent relationship among features. In this study, we proposed a novel Ontology-guided Attribute Partitioning (OAP) method to better draw feature subsets by considering the domain-specific relationship among features. With the better partitioned feature subsets, we developed an ensemble learning framework, which is referred to as OAP-Ensemble Learning (OAP-EL). We applied the OAP-EL to predict cognitive deficits at 2 years of age using quantitative brain maturation and geometric features obtained at term equivalent age in very preterm infants. We demonstrated that the proposed OAP-EL approach significantly outperformed the peer ensemble learning and traditional machine learning approaches.

preprint2022arXiv

Beauville-Voisin filtrations on zero cycles of moduli space of stable sheaves on K3 surfaces

The Beauville-Voisin conjecture predicts the existence of a filtration on projective hyper-Kähler manifolds opposite to the conjecture Bloch-Beilinson filtration, called the Beauivlle-Voisin filtration. Voisin has introduced a filtration on zero cycles of an arbitrary projective hyper-Kähler manifold. On moduli space of stable objects of a projective K3 surface, there are other candidates constructed by Shen-Yin-Zhao, Barros-Flapan-Marian-Silversmith and more recently by Vial from different point of views. According to the work of Vial, all of them are proved to be equivalent except Voisin's filtration. In this paper, we show that Voisin's filtration is the same as the other filtrations. As an application, we prove a conjecture in Barros-Flapan-Marian-Silversmith's paper.

preprint2022arXiv

CAHA/PPAK Integral-field Spectroscopic Observations of M81 -- I. Circumnuclear ionized gas

Galactic circumnuclear environments of nearby galaxies provide unique opportunities for our understanding of the co-evolution between super-massive black holes and their host galaxies. Here we present a detailed study of ionized gas in the central kiloparsec region of M81, which hosts the closest prototype low-luminosity active galactic nucleus, based on optical integral-field spectroscopic observations taken with the CAHA 3.5m telescope. It is found that much of the circumnuclear ionized gas is concentraed within a bright core of $\sim$200 pc in extent and a surrounding spiral-like structure known as the nuclear spiral. The total mass of the ionized gas is estimated to be $\sim2\times10^5\rm~M_\odot$, which corresponds to a few percent of the cold gas mass in this region, as traced by co-spatial dust extinction features. Plausible signature of a bi-conical outflow along the disk plane is suggested by a pair of blueshifted/redshifted low-velocity features, symmetrically located at $\sim$ 120 -- 250 pc from the nucleus. The spatially-resolved line ratios of [N\,{\sc ii}]/H$α$ and [O\,{\sc iii}]/H$β$ demonstrate that much of the circumnuclear region can be classified as LINER (low-ionization nuclear emission-line region). However, substantial spatial variations in the line intensities and line ratios strongly suggest that different ionization/excitation mechanisms, rather than just a central dominant source of photoionization, are simultaneously at work to produce the observed line signatures.

preprint2022arXiv

CHANG-ES. XXIV. First Detection of A Radio Nuclear Ring and Potential LLAGN in NGC 5792

We report the discoveries of a nuclear ring of diameter 10$\arcsec$ ($\sim$1.5 kpc) and a potential low luminosity active galactic nucleus (LLAGN) in the radio continuum emission map of the edge-on barred spiral galaxy NGC~5792. These discoveries are based on the Continuum Halos in Nearby Galaxies - an Expanded Very Large Array (VLA) Survey, as well as subsequent VLA observations of sub-arcsecond resolution. Using a mixture of H$α$ and 24 $μ$m calibration, we disentangle the thermal and non-thermal radio emission of the nuclear region, and derive a star formation rate (SFR) of $\sim 0.4~M_{\sun}$ yr$^{-1}$. We find that the nuclear ring is dominated by non-thermal synchrotron emission. The synchrotron-based SFR is about three times of the mixture-based SFR. This result indicates that the nuclear ring underwent more intense star-forming activity in the past, and now its star formation is in the low state. The sub-arcsecond VLA images resolve six individual knots on the nuclear ring. The equipartition magnetic field strength $B_{\rm eq}$ of the knots varies from 77 to 88 $μ$G. The radio ring surrounds a point-like faint radio core of $S_{\rm 6GHz}=(16\pm4)$ $μ$Jy with polarized lobes at the center of NGC~5792, which suggests an LLAGN with an Eddington ratio $\sim10^{-5}$. This radio nuclear ring is reminiscent of the Central Molecular Zone (CMZ) of the Galaxy. Both of them consist of a nuclear ring and LLAGN.

preprint2022arXiv

Deligne-Beilinson cohomology of the universal K3 surface

O'Grady's generalized Franchetta conjecture (GFC) is concerned with codimension 2 algebraic cycles on universal polarized K3 surfaces. In \cite{BL17}, this conjecture has been studied in the Betti cohomology groups. Following a suggestion of Voisin, we investigate this problem in the Deligne-Beilinson (DB) cohomology groups. In this paper, we develop the theory of Deligne-Beilinson cohomology groups on separated (smooth) Deligne-Mumford stacks. Using the automorphic cohomology group and Noether-Lefschetz theory, we compute the 4-th DB-cohomology group of universal oriented polarized K3 surfaces with at worst an $A_1$-singularity and show that GFC for such family holds in DB-cohomology. In particular, this confirms O'Grady's original conjecture in DB cohomology.

preprint2022arXiv

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.

preprint2022arXiv

Implicit Regularization and Convergence for Weight Normalization

Normalization methods such as batch [Ioffe and Szegedy, 2015], weight [Salimansand Kingma, 2016], instance [Ulyanov et al., 2016], and layer normalization [Baet al., 2016] have been widely used in modern machine learning. Here, we study the weight normalization (WN) method [Salimans and Kingma, 2016] and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least-squares regression. WN and rPGD reparametrize the weights with a scale g and a unit vector w and thus the objective function becomes non-convex. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and converge close to the minimum l2 norm solution, even for initializations far from zero. For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution. This is different from the behavior of gradient descent, which converges to the minimum norm solution only when started at a point in the range space of the feature matrix, and is thus more sensitive to initialization.

preprint2022arXiv

Interpretability of Neural Network With Physiological Mechanisms

Deep learning continues to play as a powerful state-of-art technique that has achieved extraordinary accuracy levels in various domains of regression and classification tasks, including images, video, signal, and natural language data. The original goal of proposing the neural network model is to improve the understanding of complex human brains using a mathematical expression approach. However, recent deep learning techniques continue to lose the interpretations of its functional process by being treated mostly as a black-box approximator. To address this issue, such an AI model needs to be biological and physiological realistic to incorporate a better understanding of human-machine evolutionary intelligence. In this study, we compare neural networks and biological circuits to discover the similarities and differences from various perspective views. We further discuss the insights into how neural networks learn from data by investigating human biological behaviors and understandable justifications.

preprint2022arXiv

Mukai's program for non-primitive curves on K3 surfaces

Mukai's program seeks to recover a K3 surface $X$ from any curve $C$ on it by exhibiting it as a Fourier-Mukai partner to a Brill-Noether locus of vector bundles on the curve. In the case $X$ has Picard number one and the curve $C\in |H|$ is primitive, this was confirmed by Feyzbakhsh for $g\geq 11$ and $g\neq 12$. More recently, Feyzbakhsh has shown that certain moduli spaces of stable bundles on $X$ are isomorphic to the Brill-Noether locus of curves in $|H|$ if $g$ is sufficiently large. In this paper, we work with irreducible curves in a non-primitive ample linear system $|mH|$ and prove that Mukai's program is valid for any irreducible curve when $g\neq 2$, $mg\geq 11$ and $mg\neq 12$. Furthermore, we introduce the destabilising regions to improve Feyzbakhsh's analysis. We show that there are hyper-Kähler varieties as Brill-Noether loci of curves in every dimension.

preprint2022arXiv

Peridynamic modeling for impact failure of wet concrete considering the influence of saturation

In this paper, a modified intermediately homogenized peridynamic (IH-PD) model for analyzing impact failure of wet concrete has been presented under the configuration of ordinary state-based peridynamic theory. The meso-structural properties of concrete are linked to the macroscopic mechanical behavior in the IH-PD model, where the heterogeneity of concrete is taken into account, and the calculation cost does not increase. Simultaneously, the porosity of concrete is considered, which is implemented by deleting the bond between two material points, as well as the influence of porosity on the mechanical properties of concrete. Moreover, the effective bulk and shear modulus of cement mortar in wet concrete (saturated and unsaturated concrete) are calculated respectively. The dynamic model for wet concrete is described from three aspects: strength, dynamic increase factor, and equation of state. Validation of the proposed model is established through analyzing some benchmark tests and comparing with the corresponding experiment and other available numerical results.

preprint2022arXiv

Quantum circuit architecture search on a superconducting processor

Variational quantum algorithms (VQAs) have shown strong evidences to gain provable computational advantages for diverse fields such as finance, machine learning, and chemistry. However, the heuristic ansatz exploited in modern VQAs is incapable of balancing the tradeoff between expressivity and trainability, which may lead to the degraded performance when executed on the noisy intermediate-scale quantum (NISQ) machines. To address this issue, here we demonstrate the first proof-of-principle experiment of applying an efficient automatic ansatz design technique, i.e., quantum architecture search (QAS), to enhance VQAs on an 8-qubit superconducting quantum processor. In particular, we apply QAS to tailor the hardware-efficient ansatz towards classification tasks. Compared with the heuristic ansatze, the ansatz designed by QAS improves test accuracy from 31% to 98%. We further explain this superior performance by visualizing the loss landscape and analyzing effective parameters of all ansatze. Our work provides concrete guidance for developing variable ansatze to tackle various large-scale quantum learning problems with advantages.

preprint2022arXiv

Quenching of Massive Disk Galaxies in the IllustrisTNG Simulation

A rare population of massive disk galaxies have been found to invade the red sequence dominated by early-type galaxies. These red/quenched massive disk galaxies have recently gained great interest into their formation and origins. The usually proposed quenching mechanisms, such as bar quenching and environment quenching, seem not suitable for those bulge-less quenched disks in low-density environment. In this paper, we use the IllustrisTNG-300 simulation to investigate the formation of massive quenched central disk galaxies. It is found that these galaxies contain less gas and harbor giant supermassive black holes(SMBHs) (above $ 10^{8}M_{\odot}$) than their star forming counterparts. By tracing their formation history, we found that quenched disk galaxies formed early and preserved disk morphology for cosmological time scales. They have experienced less than one major merger on average and it is mainly mini-mergers (mass ratio $<$1/10) that contribute to the growth of their SMBHs. In the Illustris-TNG simulation the black hole feedback mode switches from thermal to kinetic feedback when the black hole mass is more massive than $\sim 10^{8}M_{\odot}$, which is more efficient to eject gas outside of the galaxy and to suppress further cooling of hot gaseous halo. We conclude that kinetic AGN feedback in massive red/quenched disk galaxy is the dominant quenching mechanism.

preprint2022arXiv

Robust Training of Neural Networks Using Scale Invariant Architectures

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, i.e. the scale of parameter doesn&#39;t affect the output of the network, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\tfrac{2λ}η}$, where $η$ is learning rate and $λ$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.

preprint2022arXiv

Unpolarized Shafarevich conjectures for hyper-Kähler varieties

Shafarevich conjecture/problem is about the finiteness of isomorphism classes of a family of varieties defined over a number field with good reduction outside a finite collection of places. For K3 surfaces, such a finiteness result was proved by Y. She. For hyper-Kähler varieties, which are higher-dimensional analogs of K3 surfaces, Y. André has verified the Shafarevich conjecture for hyper-Kähler varieties of a given dimension and admitting a very ample polarization of bounded degree. In this paper, we provide a unification of both results by proving the (unpolarized) Shafarevich conjecture for hyper-Kähler varieties in a given deformation type. In a similar fashion, generalizing a result of Orr and Skorobogatov on K3 surfaces, we prove the finiteness of geometric isomorphism classes of hyper-Kähler varieties of CM type in a given deformation type defined over a number field with bounded degree. A key to our approach is a uniform Kuga--Satake map, inspired by She&#39;s work, and we study its arithmetic properties, which are of independent interest.

preprint2022arXiv

Very Large Array Multi-band Radio Imaging of the Triple AGN Candidate SDSS J0849+1114

Kpc-scale triple active galactic nuclei (AGNs), potential precursors of gravitationally-bound triple massive black holes (MBHs), are rarely seen objects and believed to play an important role in the evolution of MBHs and their host galaxies. In this work we present a multi-band (3.0, 6.0 10.0, and 15.0 GHz), high-resolution radio imaging of the triple AGN candidate, SDSS J0849+1114, using the Very Large Array. Two of the three nuclei (A and C) are detected at 3.0, 6.0, and 15 GHz for the first time, both exhibiting a steep spectrum over 3--15 GHz (with a spectral index $-0.90 \pm 0.05$ and $-1.03 \pm 0.04$) consistent with a synchrotron origin. Nucleus A, the strongest nucleus among the three, shows a double-sided jet, with the jet orientation changing by $\sim20^{\circ}$ between its inner 1&#34; and the outer 5.5&#34; (8.1 kpc) components, which may be explained as the MBH&#39;s angular momentum having been altered by merger-enhanced accretion. Nucleus C also shows a two-sided jet, with the western jet inflating into a radio lobe with an extent of 1.5&#34; (2.2 kpc). The internal energy of the radio lobe is estimated to be $\rm 5.0 \times 10^{55}$ erg, for an equipartition magnetic field strength of $\rm \sim 160\ μG$. No significant radio emission is detected at all four frequencies for nucleus B, yielding an upper limit of 15, 15, 15, and 18 $\rm μJy\ beam^{-1}$ at 3.0, 6.0, 10.0, and 15.0 GHz, based on which we constrain the star formation rate in nucleus B to be $\lesssim 0.4~\rm M_{\odot}~yr^{-1}$.

preprint2022arXiv

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold. Intuitively, with a sufficiently small learning rate $η$, SGD tracks Gradient Descent (GD) until it gets close to such manifold, where the gradient noise prevents further convergence. In such a regime, Blanc et al. (2020) proved that SGD with label noise locally decreases a regularizer-like term, the sharpness of loss, $\mathrm{tr}[\nabla^2 L]$. The current paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991). It allows in principle a complete characterization for the regularization effect of SGD around such manifold -- i.e., the &#34;implicit bias&#34; -- using a stochastic differential equation (SDE) describing the limiting dynamics of the parameters, which is determined jointly by the loss function and the noise covariance. This yields some new results: (1) a global analysis of the implicit bias valid for $η^{-2}$ steps, in contrast to the local analysis of Blanc et al. (2020) that is only valid for $η^{-1.6}$ steps and (2) allowing arbitrary noise covariance. As an application, we show with arbitrary large initialization, label noise SGD can always escape the kernel regime and only requires $O(κ\ln d)$ samples for learning an $κ$-sparse overparametrized linear model in $\mathbb{R}^d$ (Woodworth et al., 2020), while GD initialized in the kernel regime requires $Ω(d)$ samples. This upper bound is minimax optimal and improves the previous $\tilde{O}(κ^2)$ upper bound (HaoChen et al., 2020).

preprint2021arXiv

Evidence for A Hot Wind from High-resolution X-ray Spectroscopic Observation of the Low-luminosity Active Galactic Nucleus in NGC 7213

Super-massive black holes (SMBHs) spend most of their lifetime accreting at a rate well below the Eddington limit, manifesting themselves as low-luminosity active galactic nuclei (LLAGNs). The prevalence of a hot wind from LLAGNs is a generic prediction by theories and numerical simulations of black hole accretion and is recently becoming a crucial ingredient of AGN kinetic feedback in cosmological simulations of galaxy evolution. However, direct observational evidence for this hot wind is still scarce. In this work, we identify significant Fe XXVI Ly$α$ and Fe XXV K$α$ emission lines from high-resolution Chandra grating spectra of the LLAGN in NGC\,7213, a nearby Sa galaxy hosting a $\sim10^8\rm~M_\odot$ SMBH, confirming previous work. We find that these lines exhibit a blueshifted line-of-sight velocity of $\sim1100\rm~km s^{-1}$ and a high XXVI Ly$α$ to XXV K$α$ flux ratio implying for a $\sim16$ keV hot plasma. By confronting these spectral features with synthetic X-ray spectra based on our custom magnetohydrodynamical simulations, we find that the high-velocity, hot plasma is naturally explained by the putative hot wind driven by the hot accretion flow powering this LLAGN. Alternative plausible origins of this hot plasma, including stellar activities, AGN photoionization and the hot accretion flow itself, are quantitatively disfavored. The inferred kinetic energy and momentum carried by the wind can serve as strong feedback to the environment. We compare NGC\,7213 to M81*, in which strong evidence for a hot wind was recently presented, and discuss implications on the universality and detectability of hot winds from LLAGNs.

preprint2021arXiv

Uniqueness for fractional nonsymmetric diffusion equations and an application to an inverse source problem

In this paper, we discuss the uniqueness for solution to time-fractional diffusion equation $\partial_t^α(u-u_0) + Au=0$ with the homogeneous Dirichlet boundary condition, where an elliptic operator $-A$ is not necessarily symmetric. We prove that the solution is identically zero if its normal derivative with respect to the operator $A$ vanishes on an arbitrary small part of the spatial domain over a time interval. The proof is based on the Laplace transform and the spectral decomposition, and is valid for more general time-fractional partial differential equations, including those involving non symmetric operators.

preprint2020arXiv

A Herschel mapping of [C ii], [O i] and [O iii] lines from the circumnuclear region of M31

The circumnuclear region of M31, consisting of multiphase interstellar medium, provides a close-up view of the interaction of the central supermassive black hole and surrounding materials. Far-infrared (FIR) line structure lines and their flux ratios can be used as diagnostics of physical properties of the neutral gas in this region. Here we present the first FIR spectroscopic mapping of the circumnuclear region of M31 in [C ii] 158 um, [O i] 63 um and [O iii] 88 um lines with the Herschel Space Observatory, covering a ~500 x 500 pc (2&#39; x 2&#39;) field. Significant emissions of all three lines are detected along the so-called nuclear spiral across the central kpc of M31. The velocity field under a spatial resolution of ~50 pc of the three lines are in broad consistency and also consistent with previous CO(3-2) line observations in the central region. Combined with existing [C ii] and CO(3-2) observations of five other fields targeting on the disk, we derived the radial distribution of [C ii]/CO(3-2) flux ratio, and found that this ratio is higher in the center than the disk, indicating a low gas density and strong radiation field in the central region. We also found that the [C ii]/FIR ratio in the central region is 5.4 (+-0.8) x 10^-3, which exhibits an increasing trend with the galactocentric radius, suggesting an increasing contribution from old stellar population to dust heating towards the center.

preprint2020arXiv

Calibrated Intervention and Containment of the COVID-19 Pandemic

Within a short period of time, COVID-19 grew into a world-wide pandemic. Transmission by pre-symptomatic and asymptomatic viral carriers rendered intervention and containment of the disease extremely challenging. Based on reported infection case studies, we construct an epidemiological model that focuses on transmission around the symptom onset. The model is calibrated against incubation period and pairwise transmission statistics during the initial outbreaks of the pandemic outside Wuhan with minimal non-pharmaceutical interventions. Mathematical treatment of the model yields explicit expressions for the size of latent and pre-symptomatic subpopulations during the exponential growth phase, with the local epidemic growth rate as input. We then explore reduction of the basic reproduction number R_0 through specific disease control measures such as contact tracing, testing, social distancing, wearing masks and sheltering in place. When these measures are implemented in combination, their effects on R_0 multiply. We also compare our model behaviour to the first wave of the COVID-19 spreading in various affected regions and highlight generic and less generic features of the pandemic development.

preprint2020arXiv

Chemical abundances in Sgr A East: evidence for a Type Iax supernova remnant

Recent observations have shown a remarkable diversity of observational behaviors and explosion mechanisms in thermonuclear supernovae (SNe). An emerging class of peculiar thermonuclear SNe, called Type Iax, show photometric and spectroscopic behaviors distinct from normal Type Ia. Their origin remains highly controversial, but pure turbulent deflagration of white dwarfs (WDs) has been regarded as the leading formation theory. The large population of Type Iax indicates the existence of unidentified Galactic Type Iax supernova remnants (SNRs). We report evidence that SNR Sgr A East in the Galactic center resulted from a pure turbulent deflagration of a Chandrasekhar-mass carbon-oxygen WD, an explosion mechanism used for Type Iax SNe. Our X-ray spectroscopic study of Sgr A East using 3 Ms of Chandra data shows a low ratio of intermediate-mass elements to Fe and large Mn/Fe and Ni/Fe ratios. This abundance pattern does not accord with the core-collapse or normal Type Ia models. Sgr A East is thus the first Galactic SNR for which a likely Type Iax origin has been proposed and the nearest target for studying this peculiar class. We compared Sgr A East with the Fe-rich SNRs 3C 397 and W49B, which also have high Mn and Cr abundances and were claimed to result from deflagration-to-detonation explosions of Chandrasekhar-mass WDs (although with disputes). Our study shows that they have distinct abundance patterns. The X-ray spectroscopic studies of thermonuclear SNRs provide observational evidence for the theories that there are diverse explosion channels and various metal outputs for Chandrasekhar-mass WDs.

preprint2020arXiv

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Mode connectivity is a surprising phenomenon in the loss landscape of deep nets. Optima -- at least those discovered by gradient-based optimization -- turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets. Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory.

preprint2020arXiv

Exploring the Mass Segregation Effect of X-ray Sources in Globular Clusters. III. Signs of Binary Disruption in M28

Using archival {\it Chandra} observations with a total effective exposure of 323 ks, we derive an updated catalog of point sources in the bulge globular cluster M28. The catalog contains 502 X-ray sources within an area of $\sim475\, \rm arcmin^{2}$, and more than $90\%$ of these sources are first detected in this cluster. We find significant dips in the radial distribution profiles of X-ray sources in M28, with the projected distance and width of the distribution dip for bright ($L_{X} \gtrsim 4.5\times 10^{30} {\rm\ erg\ \,s^{-1}}$) X-ray sources are larger than the faint ($L_{X} \lesssim 4.5\times 10^{30} {\rm\ erg\ \,s^{-1}}$) sources. The &#34;generalized King model&#34; fitting give a slightly larger average mass for the bright sources ($1.30\pm0.15\,M_{\odot}$) than the faint sources ($1.09\pm0.14\,M_{\odot}$), which support a universal mass segregation delay between heavy objects in GCs. Compared with 47 Tuc and Terzan 5, we show that the dynamical age of M28 is comparable to Terzan 5 and much smaller than 47 Tuc, but it is evolving more fast (i.e., with smaller two-body relaxation timescale) than 47 Tuc. These features may suggest an acceleration effect of cluster dynamical evolution by tidal shock in M28. Besides, we find an abnormal deficiency of X-ray sources in the central region ($R \lesssim 1.5 \rm~arcmin$) of M28 than its outskirts, which indicate that M28 may have suffered an early phase of primordial binary disruption within its central region, and mass segregation effect will erase such a phenomenon as cluster evolve to older dynamical age.

preprint2020arXiv

Galactic Center IRS13E: Colliding Stellar Winds or an Intermediate Mass Black Hole?

A small cluster of massive stars residing in the Galactic center, collectively known as IRS13E, is of special interest due to its close proximity to Sgr A* and the possibility that an embedded intermediate-mass black hole (IMBH) binds its member stars. It has been suggested that colliding winds from two member stars, both classified as Wolf-Rayet type, are responsible for the observed X-ray, infrared and radio emission from IRS13E. We have conducted an in-depth study of the X-ray spatial, temporal and spectral properties of IRS13E, based on 5.6 Ms of ultra-deep Chandra observations obtained over 20 years. These X-ray observations show no significant evidence for source variability. We have also explored the kinematics of the cluster members, using Keck near-infrared imaging and spectroscopic data on a 14-yr baseline that considerably improve the accuracy of stars&#39; proper motions. The observations are interpreted using 3-dimensional hydrodynamical simulations of colliding winds tailored to match the physical conditions of IRS13E, leading us to conclude that the observed X-ray spectrum and morphology can be well explained by the colliding wind scenario, in the meantime offering no support for the presence of a putative IMBH. An IMBH more massive than a few $10^3{\rm~M_\odot}$ is also strongly disfavored by the stellar kinematics.

preprint2020arXiv

Progressive Learning and Disentanglement of Hierarchical Representations

Learning rich representation from data is an important task for deep generative models such as variational auto-encoder (VAE). However, by extracting high-level abstractions in the bottom-up inference process, the goal of preserving all factors of variations for top-down generation is compromised. Motivated by the concept of &#34;starting small&#34;, we present a strategy to progressively learn independent hierarchical representations from high- to low-levels of abstractions. The model starts with learning the most abstract representation, and then progressively grow the network architecture to introduce new representations at different levels of abstraction. We quantitatively demonstrate the ability of the presented model to improve disentanglement in comparison to existing works on two benchmark data sets using three disentanglement metrics, including a new metric we proposed to complement the previously-presented metric of mutual information gap. We further present both qualitative and quantitative evidence on how the progression of learning improves disentangling of hierarchical representations. By drawing on the respective advantage of hierarchical representation learning and progressive learning, this is to our knowledge the first attempt to improve disentanglement by progressively growing the capacity of VAE to learn hierarchical representations.

preprint2020arXiv

Resolving the Nuclear Radio Emission from M32 with Very Large Array

The Local Group dwarf elliptical galaxy M32 hosts one of the nearest and most under-luminous super-massive black holes (SMBHs) ever known, offering a rare opportunity to study the physics of accreting SMBHs at the most quiescent state. Recent Very Large Array (VLA) observations have detected a radio source at the nucleus of M32, which is suggested to be the radio counterpart of the SMBH. To further investigate the radio properties of this nuclear source, we have conducted follow-up, high-resolution VLA observations in four epochs between 2015--2017, each with dual frequencies. At 6 GHz, the nuclear source is resolved under an angular resolution of $\sim$0\farcs4, exhibiting a coreless, slightly lopsided morphology with a detectable extent of $\sim$2.5 \arcsec ($\sim$10 parsec). No significant variability can be found among the four epochs. At 15 GHz, no significant emission can be detected within the same region, pointing to a steep intrinsic radio spectrum (with a 3\,$σ$ upper limit of -1.46 for the spectral index). We discuss possible scenarios for the nature of this nuclear source and conclude that a stellar origin, in particular planetary nebulae, X-ray binaries, supernova remnants or diffuse ionized gas powered by massive stars, can be ruled out.Instead, the observed radio properties can be explained by synchrotron radiation from a hypothetical wind driven by the weakly accreting SMBH.

preprint2020arXiv

Semi-supervised Medical Image Classification with Global Latent Mixing

Computer-aided diagnosis via deep learning relies on large-scale annotated data sets, which can be costly when involving expert knowledge. Semi-supervised learning (SSL) mitigates this challenge by leveraging unlabeled data. One effective SSL approach is to regularize the local smoothness of neural functions via perturbations around single data points. In this work, we argue that regularizing the global smoothness of neural functions by filling the void in between data points can further improve SSL. We present a novel SSL approach that trains the neural network on linear mixing of labeled and unlabeled data, at both the input and latent space in order to regularize different portions of the network. We evaluated the presented model on two distinct medical image data sets for semi-supervised classification of thoracic disease and skin lesion, demonstrating its improved performance over SSL with local perturbations and SSL with global mixing but at the input space only. Our code is available at https://github.com/Prasanna1991/LatentMixing.

preprint2020arXiv

Well-posedness for the backward problems in time for general time-fractional diffusion equation

In this article, we consider a partial differential equation with Caputo time-derivative: $\partial_t^αu + Au = F$ where $0< α< 1$ and $u$ satisfies the zero Dirichlet boundary condition. For a non-symmetric elliptic operator $-A$ of the second order and given $F$, we prove the well-posedness for the backward problem in time and our result generalizes the existing results assuming that $A$ is symmetric. The key is the perturbation argument and the completeness of the generalized eigenfunctions of the elliptic operator $A$.

preprint2020arXiv

What has quenched the massive spiral galaxies?

Quenched massive spiral galaxies have attracted great attention recently, as more data is available to constrain their environment and cold gas content. However, the quenching mechanism is still uncertain, as it depends on the mass range and baryon budget of the galaxy. In this letter, we report the identification of a rare population of very massive, quenched spiral galaxies with stellar mass $\gtrsim10^{11}{\rm~M_\odot}$ and halo mass $\gtrsim10^{13}{\rm~M_\odot}$ from the Sloan Digital Sky Survey at redshift $z\sim0.1$. Our CO observations using the IRAM-30m telescope show that these galaxies contain only a small amount of molecular gas. Similar galaxies are also seen in the state-of-the-art semi-analytical models and hydro-dynamical simulations. It is found from these theoretical models that these quenched spiral galaxies harbor massive black holes, suggesting that feedback from the central black holes has quenched these spiral galaxies. This quenching mechanism seems to challenge the popular scenario of the co-evolution between massive black holes and massive bulges.

preprint2019arXiv

Intra-cluster GC-LMXB in the Fornax galaxy cluster

The formation of Low mass X-ray binaries (LMXB) is favored within dense stellar systems such as Globular Clusters (GCs). The connection between LMXB and Globular Clusters has been extensively studied in the literature, but these studies have always been restricted to the innermost regions of galaxies. We present a study of LMXB in GCs within the central 1.5 deg^2 of the Fornax cluster with the aim of confirming the existence of a population of LMXB in intra-cluster GCs and understand if their properties are related to the host GCs, to the environment or/and to different formation channels.

preprint2019arXiv

The HASHTAG project I. A Survey of CO(3-2) Emission from the Star Forming Disc of M31

We present a CO(3-2) survey of selected regions in the M31 disc as part of the JCMT large programme, HARP and SCUBA-2 High-Resolution Terahertz Andromeda Galaxy Survey (HASHTAG). The 12 CO(3-2) fields in this survey cover a total area of 60 square arcminutes, spanning a deprojected radial range of 2 - 14 kpc across the M31 disc. Combining these observations with existing IRAM 30m CO(1-0) observations and JCMT CO(3-2) maps of the nuclear region of M31, as well as dust temperature and star formation rate surface density maps, we are able to explore the radial distribution of the CO(3-2)/CO(1-0) integrated intensity ratio (R31) and its relationship with dust temperature and star formation. We find that the value of R31 between 2 - 9 kpc galactocentric radius is 0.14, significantly lower than what is seen in the nuclear ring at ~1 kpc (R31 ~ 0.8), only to rise again to 0.27 for the fields centred on the 10 kpc star forming ring. We also found that R31 is positively correlated with dust temperature, with Spearman&#39;s rank correlation coefficient $ρ$ = 0.55. The correlation between star formation rate surface density and CO(3--2) intensity is much stronger than with CO(1-0), with $ρ$ = 0.54 compared to -0.05, suggesting that the CO(3-2) line traces warmer and denser star forming gas better. We also find that R31 correlates well with star formation rate surface density, with $ρ$ = 0.69.

preprint2014arXiv

Initial-boundary value problems for multi-term time-fractional diffusion equations with positive constant coefficients

In this paper, we investigate the well-posedness and the long-time asymptotic behavior for the initial-boundary value problem for multi-term time-fractional diffusion equations, where the time differentiation consists of a finite summation of Caputo derivatives with decreasing orders in (0,1) and positive constant coefficients. By exploiting several important properties of multinomial Mittag-Leffler functions, various estimates follow from the explicit solutions in form of these special functions. Then the uniqueness and continuous dependency upon initial value and source term are established, from which the continuous dependence of solution of Lipschitz type with respect to various coefficients is also verified. Finally, by a Laplace transform argument, it turns out that the decay rate of the solution as time tends to infinity is dominated by the minimum order of the time-fractional derivatives.