Researcher profile

Xin Ma

Xin Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Lifelong Model Editing aims to continuously update evolving facts in Large Language Models while preserving unrelated knowledge and general capabilities, yet it remains plagued by catastrophic forgetting and model collapse. Empirically, we find that recent editors resilient over long horizons share the same core strategy: Lifelong Normalization (LN), which normalizes value gradients using running statistics. Removing LN causes immediate performance collapse, and we observe a counter-intuitive positive cumulative effect where early edits can promote the success of future edits. Yet the mechanism of LN remains a "black box", leaving its precise role in lifelong stability poorly understood. In this work, we provide the first theoretical account of LN in the lifelong regime. Our analysis reveals a self-reinforcing stability loop and proves that, when combined with ridge-regularized regression, LN yields parameter updates with asymptotic orthogonality and bounded norms, directly mitigating forgetting and systemic collapse. Based on these insights, we derive StableEdit, which strengthens this stability loop via an explicit warm-up stage and full whitening, improving long-horizon stability at minimal overhead. Extensive experiments validate our theory and demonstrate competitive performance. Our code is available at https://github.com/MINE-USTC/StableEdit.

preprint2024arXiv

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.

preprint2022arXiv

A categorical study on the generalized type semigroup

In this short note, we show that the generalized type semigroup $\CW(X, Γ)$ introduced by the author in \cite{M3} belongs to the category \textnormal{W}. In particular, we demonstrate that $\CW(X, Γ)$ satisfies axioms (W1)-(W4) and (W6). When $X$ is zero-dimensional, we also establish (W5) for the semigroup. This supports the analogy between the generalized type semigroup and pre-completed Cuntz semigroup $W(\cdot)$ for $C^*$-algebras.

preprint2022arXiv

A DNS Tunnel Sliding Window Differential Detection Method Based on Normal Distribution Reasonable Range Filtering

A covert attack method often used by APT organizations is the DNS tunnel, which is used to pass information by constructing C2 networks. And they often use the method of frequently changing domain names and server IP addresses to evade monitoring, which makes it extremely difficult to detect them. However, they carry DNS tunnel information traffic in normal DNS communication, which inevitably brings anomalies in some statistical characteristics of DNS traffic, so that it would provide security personnel with the opportunity to find them. Based on the above considerations, this paper studies the statistical discovery methodology of typical DNS tunnel high-frequency query behavior. Firstly, we analyze the distribution of the DNS domain name length and times and finds that the DNS domain name length and times follow the normal distribution law. Secondly, based on this distribution law, we propose a method for detecting and discovering high-frequency DNS query behaviors of non-single domain names based on the statistical rules of domain name length and frequency and we also give three theorems as theoretical support. Thirdly, we design a sliding window difference scheme based on the above method. Experimental results show that our method has a higher detection rate. At the same time, since our method does not need to construct a data set, it has better practicability in detecting unknown DNS tunnels. This also shows that our detection method based on mathematical models can effectively avoid the dilemma for machine learning methods that must have useful training data sets, and has strong practical significance.

preprint2022arXiv

Boundary actions of CAT(0) spaces and their $C^*$-algebras

In this paper, we study boundary actions of CAT(0) spaces from a point of view of topological dynamics and $C^*$-algebras. First, we investigate the actions of right-angled Coexter groups and right-angled Artin groups with finite defining graphs on the visual boundaries and the Nevo-Sageev boundaries of their natural assigned CAT(0) cube complexes. In particular, we establish (strongly) pure infiniteness results for reduced crossed product $C^*$-algebras of these actions through investigating the corresponding $\cat$ cube complexes and establishing necessary dynamical properties such as minimality, topological freeness and pure infiniteness of the actions. In addition, we study actions of fundamental groups of graphs of groups on the visual boundaries of their Bass-Serre trees. We show that the existence of repeatable paths essentially implies that the action is $2$-filling, from which, we also obtain a large class of unital Kirchberg algebras. Furthermore, our result also provides a new method in identifying $C^*$-simple generalized Baumslag-Solitar groups. The examples of groups obtained from our method have $n$-paradoxical towers in the sense of \cite{G-G-K-N}. This class particularly contains non-degenerated free products, Baumslag-Solitar groups and fundamental groups of $n$-circles or wedge sums of $n$-circles.

preprint2022arXiv

Compressing Models with Few Samples: Mimicking then Replacing

Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher model, which is cumbersome. In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher's in the penultimate layer, and then replaces teacher's layers before penultimate with a well-tuned compact one. Unlike previous layer-wise reconstruction methods, our MiR optimizes the entire network holistically, which is not only simple and effective, but also unsupervised and general. MiR outperforms previous methods with large margins. Codes will be available soon.

preprint2022arXiv

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.

preprint2022arXiv

FedSSO: A Federated Server-Side Second-Order Optimization Algorithm

In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communication for second-order updates between clients and server entirely. We provide theoretical guarantee for convergence of our novel method, and empirically demonstrate our fast convergence and communication savings in both convex and non-convex settings.

preprint2022arXiv

Multi-task Learning with High-Dimensional Noisy Images

Recent medical imaging studies have given rise to distinct but inter-related datasets corresponding to multiple experimental tasks or longitudinal visits. Standard scalar-on-image regression models that fit each dataset separately are not equipped to leverage information across inter-related images, and existing multi-task learning approaches are compromised by the inability to account for the noise that is often observed in images. We propose a novel joint scalar-on-image regression framework involving wavelet-based image representations with grouped penalties that are designed to pool information across inter-related images for joint learning, and which explicitly accounts for noise in high-dimensional images via a projection-based approach. In the presence of non-convexity arising due to noisy images, we derive non-asymptotic error bounds under non-convex as well as convex grouped penalties, even when the number of voxels increases exponentially with sample size. A projected gradient descent algorithm is used for computation, which is shown to approximate the optimal solution via well-defined non-asymptotic optimization error bounds under noisy images. Extensive simulations and application to a motivating longitudinal Alzheimer's disease study illustrate significantly improved predictive ability and greater power to detect true signals, that are simply missed by existing methods without noise correction due to the attenuation to null phenomenon.

preprint2022arXiv

Resolving subcategories and dimensions in recollements of extriangulated categories

Recently, Wang, Wei and Zhang introduced the notion of recollements of extriangulated categories. In this paper, let $(\mathcal{A},\mathcal{B},\mathcal{C})$ be a recollement of extriangulated categories. We provide some methods to construct resolving subcategories in $(\mathcal{A},\mathcal{B},\mathcal{C})$. As applications of the Auslander-Reiten correspondence, we get the gluing of cotilting modules in a recollement of module categories for artin algebras. We also give some bounds of resolution dimensions of the categories involved in $(\mathcal{A},\mathcal{B},\mathcal{C})$ with respect to resolving subcategories, which generalize some known results.

preprint2022arXiv

TL-GAN: Improving Traffic Light Recognition via Data Synthesis for Autonomous Driving

Traffic light recognition, as a critical component of the perception module of self-driving vehicles, plays a vital role in the intelligent transportation systems. The prevalent deep learning based traffic light recognition methods heavily hinge on the large quantity and rich diversity of training data. However, it is quite challenging to collect data in various rare scenarios such as flashing, blackout or extreme weather, thus resulting in the imbalanced distribution of training data and consequently the degraded performance in recognizing rare classes. In this paper, we seek to improve traffic light recognition by leveraging data synthesis. Inspired by the generative adversarial networks (GANs), we propose a novel traffic light generation approach TL-GAN to synthesize the data of rare classes to improve traffic light recognition for autonomous driving. TL-GAN disentangles traffic light sequence generation into image synthesis and sequence assembling. In the image synthesis stage, our approach enables conditional generation to allow full control of the color of the generated traffic light images. In the sequence assembling stage, we design the style mixing and adaptive template to synthesize realistic and diverse traffic light sequences. Extensive experiments show that the proposed TL-GAN renders remarkable improvement over the baseline without using the generated data, leading to the state-of-the-art performance in comparison with the competing algorithms that are used for general image synthesis and data imbalance tackling.

preprint2021arXiv

Adaptive Deconvolution-based stereo matching Net for Local Stereo Matching

In deep learning-based local stereo matching methods, larger image patches usually bring better stereo matching accuracy. However, it is unrealistic to increase the size of the image patch size without restriction. Arbitrarily extending the patch size will change the local stereo matching method into the global stereo matching method, and the matching accuracy will be saturated. We simplified the existing Siamese convolutional network by reducing the number of network parameters and propose an efficient CNN based structure, namely Adaptive Deconvolution-based disparity matching Net (ADSM net) by adding deconvolution layers to learn how to enlarge the size of input feature map for the following convolution layers. Experimental results on the KITTI 2012 and 2015 datasets demonstrate that the proposed method can achieve a good trade-off between accuracy and complexity.

preprint2021arXiv

Interfacial Dzyaloshinskii-Moriya interaction and spin-orbit torque in Au1-xPtx/Co bilayers with varying interfacial spin-orbit coupling

The quantitative roles of the interfacial spin-orbit coupling (SOC) in Dzyaloshinskii-Moriya interaction (DMI) and dampinglike spin-orbit torque (τDL) have remained unsettled after a decade of intensive study. Here, we report a conclusive experiment evidence that, because of the critical role of the interfacial orbital hybridization, the interfacial DMI is not necessarily a linear function of the interfacial SOC, e.g. at Au1-xPtx/Co interfaces where the interfacial SOC can be tuned significantly via strongly composition (x)-dependent spin-orbit proximity effect without varying the bulk SOC and the electronegativity of the Au1-xPtx layer. We also find that τDL in the Au1-xPtx/Co bilayers varies distinctly from the interfacial SOC as a function of x, indicating no important τDL contribution from the interfacial Rashba-Edelstein effect.

preprint2021arXiv

Point-line-based RGB-D SLAM and Bundle Adjustment Uncertainty Analysis

Most of the state-of-the-art indirect visual SLAM methods are based on the sparse point features. However, it is hard to find enough reliable point features for state estimation in the case of low-textured scenes. Line features are abundant in urban and indoor scenes. Recent studies have shown that the combination of point and line features can provide better accuracy despite the decrease in computational efficiency. In this paper, measurements of point and line features are extracted from RGB-D data to create map features, and points on a line are treated as keypoints. We propose an extended approach to make more use of line observation information. And we prove that, in the local bundle adjustment, the estimation uncertainty of keyframe poses can be reduced when considering more landmarks with independent measurements in the optimization process. Experimental results on two public RGB-D datasets demonstrate that the proposed method has better robustness and accuracy in challenging environments.