Source author record

Qi Song

Qi Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Machine Learning cond-mat.mes-hall Artificial Intelligence cond-mat.str-el eess.IV cond-mat.supr-con Cryptography and Security eess.AS Sound

Catalog footprint

What is connected

21works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Advanced Long-term Earth System Forecasting

Reliable long-term forecasting of Earth system dynamics is fundamentally limited by instabilities in current artificial intelligence (AI) models during extended autoregressive simulations. These failures often originate from inherent spectral bias, leading to inadequate representation of critical high-frequency, small-scale processes and subsequent uncontrolled error amplification. Inspired by the nested grids in numerical models used to resolve small scales, we present TritonCast. At the core of its design is a dedicated latent dynamical core, which ensures the long-term stability of the macro-evolution at a coarse scale. An outer structure then fuses this stable trend with fine-grained local details. This design effectively mitigates the spectral bias caused by cross-scale interactions. In atmospheric science, it achieves state-of-the-art accuracy on the WeatherBench 2 benchmark while demonstrating exceptional long-term stability: executing year-long autoregressive global forecasts and completing multi-year climate simulations that span the entire available $2500$-day test period without drift. In oceanography, it extends skillful eddy forecast to $120$ days and exhibits unprecedented zero-shot cross-resolution generalization. Ablation studies reveal that this performance stems from the synergistic interplay of the architecture's core components. TritonCast thus offers a promising pathway towards a new generation of trustworthy, AI-driven simulations. This significant advance has the potential to accelerate discovery in climate and Earth system science, enabling more reliable long-term forecasting and deeper insights into complex geophysical dynamics.

preprint2026arXiv

Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most rely primarily on appearance-based similarity when updating memory. Such appearance-driven integration often leads to redundant accumulation of observations and unstable geometry when viewpoint changes occur. In this work, we propose a ray-aware pointer memory for streaming 3D reconstruction that explicitly models both spatial location and viewing direction within a unified memory representation. Each memory pointer stores its 3D position, associated ray direction, and feature embedding, allowing the system to reason jointly about geometric proximity and viewpoint consistency. Based on this representation, we introduce an adaptive pointer update strategy that replaces traditional fusion-based memory compression with a retain-or-replace mechanism. Instead of averaging nearby observations, the system selectively retains informative pointers while discarding redundant ones, preserving distinctive geometric structures while maintaining bounded memory growth. Furthermore, the joint reasoning over spatial distance and ray-direction discrepancy enables the system to distinguish between local redundancy, novel observations, and potential loop revisits in a unified manner. When loop candidates are detected, pose refinement is triggered to enforce global geometric consistency across the reconstruction. Extensive experiments demonstrate that the proposed ray-aware memory design significantly improves long-term reconstruction stability and camera pose accuracy while maintaining efficient streaming inference. Our approach provides a principled framework for scalable and drift-resistant online 3D reconstruction from image streams.

preprint2025arXiv

Towards Long-window Anchoring in Vision-Language Model Distillation

While large vision-language models (VLMs) demonstrate strong long-context understanding, their prevalent small branches fail on linguistics-photography alignment for a limited window size. We discover that knowledge distillation improves students' capability as a complement to Rotary Position Embeddings (RoPE) on window sizes (anchored from large models). Building on this insight, we propose LAid, which directly aims at the transfer of long-range attention mechanisms through two complementary components: (1) a progressive distance-weighted attention matching that dynamically emphasizes longer position differences during training, and (2) a learnable RoPE response gain modulation that selectively amplifies position sensitivity where needed. Extensive experiments across multiple model families demonstrate that LAid-distilled models achieve up to 3.2 times longer effective context windows compared to baseline small models, while maintaining or improving performance on standard VL benchmarks. Spectral analysis also suggests that LAid successfully preserves crucial low-frequency attention components that conventional methods fail to transfer. Our work not only provides practical techniques for building more efficient long-context VLMs but also offers theoretical insights into how positional understanding emerges and transfers during distillation.

preprint2022arXiv

Fully Attentional Network for Semantic Segmentation

Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These methods usually form a similarity map of RC*C (by compressing spatial dimensions) or RHW*HW (by compressing channels) to describe the feature relations along either channel or spatial dimensions, where C is the number of channels, H and W are the spatial dimensions of the input feature map. However, such practices tend to condense feature dependencies along the other dimensions,hence causing attention missing, which might lead to inferior results for small/thin categories or inconsistent segmentation inside large objects. To address this problem, we propose anew approach, namely Fully Attentional Network (FLANet),to encode both spatial and channel attentions in a single similarity map while maintaining high computational efficiency. Specifically, for each channel map, our FLANet can harvest feature responses from all other channel maps, and the associated spatial positions as well, through a novel fully attentional module. Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets,i.e., 83.6%, 46.99%, and 88.5% on the Cityscapes test set,the ADE20K validation set, and the PASCAL VOC test set,respectively.

preprint2022arXiv

Recursive Least Squares for Training and Pruning Convolutional Neural Networks

Convolutional neural networks (CNNs) have succeeded in many practical applications. However, their high computation and storage requirements often make them difficult to deploy on resource-constrained devices. In order to tackle this issue, many pruning algorithms have been proposed for CNNs, but most of them can't prune CNNs to a reasonable level. In this paper, we propose a novel algorithm for training and pruning CNNs based on the recursive least squares (RLS) optimization. After training a CNN for some epochs, our algorithm combines inverse input autocorrelation matrices and weight matrices to evaluate and prune unimportant input channels or nodes layer by layer. Then, our algorithm will continue to train the pruned network, and won't do the next pruning until the pruned network recovers the full performance of the old network. Besides for CNNs, the proposed algorithm can be used for feedforward neural networks (FNNs). Three experiments on MNIST, CIFAR-10 and SVHN datasets show that our algorithm can achieve the more reasonable pruning and have higher learning efficiency than other four popular pruning algorithms.

preprint2022arXiv

Recursive Least Squares Policy Control with Echo State Network

The echo state network (ESN) is a special type of recurrent neural networks for processing the time-series dataset. However, limited by the strong correlation among sequential samples of the agent, ESN-based policy control algorithms are difficult to use the recursive least squares (RLS) algorithm to update the ESN's parameters. To solve this problem, we propose two novel policy control algorithms, ESNRLS-Q and ESNRLS-Sarsa. Firstly, to reduce the correlation of training samples, we use the leaky integrator ESN and the mini-batch learning mode. Secondly, to make RLS suitable for training ESN in mini-batch mode, we present a new mean-approximation method for updating the RLS correlation matrix. Thirdly, to prevent ESN from over-fitting, we use the L1 regularization technique. Lastly, to prevent the target state-action value from overestimation, we employ the Mellowmax method. Simulation results show that our algorithms have good convergence performance.

preprint2022arXiv

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan' to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods.

preprint2022arXiv

Synergistic Network Learning and Label Correction for Noise-robust Image Classification

Large training datasets almost always contain examples with inaccurate or incorrect labels. Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice. To address this problem, we propose a robust label correction framework combining the ideas of small loss selection and noise correction, which learns network parameters and reassigns ground truth labels iteratively. Taking the expertise of DNNs to learn meaningful patterns before fitting noise, our framework first trains two networks over the current dataset with small loss selection. Based on the classification loss and agreement loss of two networks, we can measure the confidence of training data. More and more confident samples are selected for label correction during the learning process. We demonstrate our method on both synthetic and real-world datasets with different noise types and rates, including CIFAR-10, CIFAR-100 and Clothing1M, where our method outperforms the baseline approaches.

preprint2022arXiv

Synthesis and electronic properties of Nd$_{n+1}$Ni$_{n}$O$_{3n+1}$ Ruddlesden-Popper nickelate thin films

The rare-earth nickelates possess a diverse set of collective phenomena including metal-to-insulator transitions, magnetic phase transitions, and, upon chemical reduction, superconductivity. Here, we demonstrate epitaxial stabilization of layered nickelates in the Ruddlesden-Popper form, Nd$_{n+1}$Ni$_n$O$_{3n+1}$, using molecular beam epitaxy. By optimizing the stoichiometry of the parent perovskite NdNiO$_3$, we can reproducibly synthesize the $n = 1 - 5$ member compounds. X-ray absorption spectroscopy at the O $K$ and Ni $L$ edges indicate systematic changes in both the nickel-oxygen hybridization level and nominal nickel filling from 3$d^8$ to 3$d^7$ as we move across the series from $n = 1$ to $n = \infty$. The $n = 3 - 5$ compounds exhibit weakly hysteretic metal-to-insulator transitions with transition temperatures that depress with increasing order toward NdNiO$_3$ ($n = \infty)$.

preprint2022arXiv

Time-Frequency Attention for Monaural Speech Enhancement

Most studies on speech enhancement generally don't consider the energy distribution of speech in time-frequency (T-F) representation, which is important for accurate prediction of mask or spectra. In this paper, we present a simple yet effective T-F attention (TFA) module, where a 2-D attention map is produced to provide differentiated weights to the spectral components of T-F representation. To validate the effectiveness of our proposed TFA module, we use the residual temporal convolution network (ResTCN) as the backbone network and conduct extensive experiments on two commonly used training targets. Our experiments demonstrate that applying our TFA module significantly improves the performance in terms of five objective evaluation metrics with negligible parameter overhead. The evaluation results show that the proposed ResTCN with the TFA module (ResTCN+TFA) consistently outperforms other baselines by a large margin.

preprint2020arXiv

Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: subtle perturbations can completely change the classification results. Their vulnerability has led to a surge of research in this direction. However, most works dedicated to attacking anchor-based object detection models. In this work, we aim to present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models based on two approaches. First, we conduct category-wise instead of instance-wise attacks on the object detectors. Second, we leverage the high-level semantic information to generate the adversarial examples. Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors, even anchor-based detectors such as Faster R-CNN.

preprint2020arXiv

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones.

preprint2020arXiv

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

In recent years, scene text recognition is always regarded as a sequence-to-sequence problem. Connectionist Temporal Classification (CTC) and Attentional sequence recognition (Attn) are two very prevailing approaches to tackle this problem while they may fail in some scenarios respectively. CTC concentrates more on every individual character but is weak in text semantic dependency modeling. Attn based methods have better context semantic modeling ability while tends to overfit on limited training data. In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition. To overcome the weakness of CTC and Attn, both of them are applied in our method but with different modules in two supervised branches which can make a complementary to each other. Moreover, effective spatial and channel attention mechanisms are introduced to eliminate background noise and extract valid foreground information. Finally, a simple rectified network is implemented to rectify irregular text. The ReADS can be trained end-to-end and only word-level annotations are required. Extensive experiments on various benchmarks verify the effectiveness of ReADS which achieves state-of-the-art performance.

preprint2020arXiv

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning

The crucial components of a conventional image registration method are the choice of the right feature representations and similarity measures. These two components, although elaborately designed, are somewhat handcrafted using human knowledge. To this end, these two components are tackled in an end-to-end manner via reinforcement learning in this work. Specifically, an artificial agent, which is composed of a combined policy and value network, is trained to adjust the moving image toward the right direction. We train this network using an asynchronous reinforcement learning algorithm, where a customized reward function is also leveraged to encourage robust image registration. This trained network is further incorporated with a lookahead inference to improve the registration capability. The advantage of this algorithm is fully demonstrated by our superior performance on clinical MR and CT image pairs to other state-of-the-art medical image registration methods.

preprint2016arXiv

Crystal Structure Manipulation of the Exchange Bias in an Antiferromagnetic Film

Exchange bias is one of the most extensively studied phenomena in magnetism, since it exerts a unidirectional anisotropy to a ferromagnet (FM) when coupled to an antiferromagnet (AFM) and the control of the exchange bias is therefore very important for technological applications, such as magnetic random access memory and giant magnetoresistance sensors. In this letter, we report the crystal structure manipulation of the exchange bias in epitaxial hcp Cr2O3 films. By epitaxially growing twined (10-10) oriented Cr2O3 thin films, of which the c axis and spins of the Cr atoms lie in the film plane, we demonstrate that the exchange bias between Cr2O3 and an adjacent permalloy layer is tuned to in-plane from out-of-plane that has been observed in (0001) oriented Cr2O3 films. This is owing to the collinear exchange coupling between the spins of the Cr atoms and the adjacent FM layer. Such a highly anisotropic exchange bias phenomenon is not possible in polycrystalline films.

preprint2016arXiv

Experimental Investigation of Temperature-Dependent Gilbert Damping in Permalloy Thin Films

The Gilbert damping of ferromagnetic materials is arguably the most important but least understood phenomenological parameter that dictates real-time magnetization dynamics. Understanding the physical origin of the Gilbert damping is highly relevant to developing future fast switching spintronics devices such as magnetic sensors and magnetic random access memory. Here, we report an experimental study of temperature-dependent Gilbert damping in permalloy (Py) thin films of varying thicknesses by ferromagnetic resonance. From the thickness dependence, two independent contributions to the Gilbert damping are identified, namely bulk damping and surface damping. Of particular interest, bulk damping decreases monotonically as the temperature decreases, while surface damping shows an enhancement peak at the temperature of ~50 K. These results provide an important insight to the physical origin of the Gilbert damping in ultrathin magnetic films.

preprint2016arXiv

Magnetic anisotropy of the single-crystalline ferromagnetic insulator Cr2Ge2Te6

Cr2Ge2Te6 (CGT), a layered ferromagnetic insulator, has attracted a great deal of interest recently owing to its potential for integration with Dirac materials to realize the quantum anomalous Hall effect (QAHE) and to develop novel spintronics devices. Here, we study the uniaxial magnetic anisotropy energy of single-crystalline CGT and determine that the magnetic easy axis is directed along the c-axis in its ferromagnetic phase. In addition, CGT is an insulator below the Curie temperature. These properties make CGT a potentially promising candidate substrate for integration with topological insulators for the realization of the high-temperature QAHE.

preprint2016arXiv

Positive Exchange Bias between Permalloy and Twined (10-10)-Cr2O3 Films

We report the discovery of a positive exchange bias between Ni80Fe20 (Py) and twined (10-10)-Cr2O3 film near its blocking temperature (TB) when it is cooled in an in-plane magnetic field applied along 45 degrees from the two spin configurations of the Cr atoms. This is an abnormal behavior compared to the negative exchange bias at all temperatures below TB when the cooling and measuring magnetic fields are applied along one of the two spin configurations of the Cr atoms. We speculate these results could be related to the exchange interactions between the twined structure of the (10-10)-Cr2O3 film epitaxially grown on the rutile (001)-TiO2 substrate.

preprint2016arXiv

Spin Injection and Inverse Edelstein Effect in the Surface States of Topological Kondo Insulator SmB6

There has been considerable interest in exploiting the spin degrees of freedom of electrons for potential information storage and computing technologies. Topological insulators (TI), a class of quantum materials, have special gapless edge/surface states, where the spin polarization of the Dirac fermions is locked to the momentum direction. This spin-momentum locking property gives rise to very interesting spin-dependent physical phenomena such as the Edelstein and inverse Edelstein effects. However, the spin injection in pure surface states of TI is very challenging because of the coexistence of the highly conducting bulk states. Here, we experimentally demonstrate the spin injection and observe the inverse Edelstein effect in the surface states of a topological Kondo insulator, SmB6. At low temperatures when only surface carriers are present, a clear spin signal is observed. Furthermore, the magnetic field angle dependence of the spin signal is consistent with spin-momentum locking property of surface states of SmB6.

preprint2015arXiv

Epitaxial growth and properties of La0.7Sr0.3MnO3 thin films with micrometer wide atomic terraces

La0.7Sr0.3MnO3 (LSMO) films with extraordinarily wide atomic terraces are epitaxially grown on SrTiO3 (100) substrates by pulsed laser deposition. Atomic force microscopy measurements on the LSMO films show that the atomic step is ~ 4 Å and the atomic terrace width is more than 2 micrometers. For a 20 monolayers (MLs) LSMO film, the magnetization is determined to be 255 +- 15 emu/cm3 at room temperature, corresponding to 1.70 + - 0.11 Bohr magneton per Mn atom. As the thickness of LSMO increases from 8 MLs to 20 MLs, the critical thickness for the temperature dependent insulator-to-metal behavior transition is shown to be 9 MLs. Furthermore, post-annealing in oxygen environment improves the electron transport and magnetic properties of the LSMO films.

preprint2015arXiv

Onset of the Meissner effect at 65 K in FeSe thin film grown on Nb doped SrTiO3 substrate

We report the Meissner effect studies on an FeSe thin film grown on Nb doped SrTiO3 substrate by molecular beam epitaxy. Two-coil mutual inductance measurement clearly demonstrates the onset of diamagnetic screening at 65 K, which is consistent with the gap opening temperature determined by previous angle resolved photoemission spectroscopy results. The applied magnetic field causes a broadening of the superconducting transition near the onset temperature, which is the typical behavior for quasi-two-dimensional superconductors. Our results provide direct evidence that FeSe thin film grown on Nb doped SrTiO3 substrate has an onset TC ~ 65 K, which is the highest among all iron based superconductors discovered so far.

Qi Song

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Advanced Long-term Earth System Forecasting

Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

Towards Long-window Anchoring in Vision-Language Model Distillation

Fully Attentional Network for Semantic Segmentation

Recursive Least Squares for Training and Pruning Convolutional Neural Networks

Recursive Least Squares Policy Control with Echo State Network

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Synergistic Network Learning and Label Correction for Noise-robust Image Classification

Synthesis and electronic properties of Nd$_{n+1}$Ni$_{n}$O$_{3n+1}$ Ruddlesden-Popper nickelate thin films

Time-Frequency Attention for Monaural Speech Enhancement

Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning

Crystal Structure Manipulation of the Exchange Bias in an Antiferromagnetic Film

Experimental Investigation of Temperature-Dependent Gilbert Damping in Permalloy Thin Films

Magnetic anisotropy of the single-crystalline ferromagnetic insulator Cr2Ge2Te6

Positive Exchange Bias between Permalloy and Twined (10-10)-Cr2O3 Films

Spin Injection and Inverse Edelstein Effect in the Surface States of Topological Kondo Insulator SmB6

Epitaxial growth and properties of La0.7Sr0.3MnO3 thin films with micrometer wide atomic terraces

Onset of the Meissner effect at 65 K in FeSe thin film grown on Nb doped SrTiO3 substrate