Source author record

Xin Xu

Xin Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

36works

31topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poorly understood. In this work, we argue that OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This foresight manifests in two aspects. First, at the \textbf{Module-Allocation Level}, OPD identifies regions with low marginal utility and concentrates updates on modules that are more critical to reasoning. Second, at the \textbf{Update-Direction Level}, OPD exhibits stronger low-rank concentration, with its dominant subspaces aligning closely with the final update subspace early in training. Building on these findings, we propose \textbf{EffOPD}, a plug-and-play acceleration method that speeds up OPD by adaptively selecting an extrapolation step size and moving along the current update direction. EffOPD requires no additional trainable modules or complex hyperparameter tuning, and achieves an average training acceleration of $3\times$ while maintaining comparable final performance. Overall, our findings provide a parameter-dynamics perspective for understanding the efficiency of OPD and offer practical insights for designing more efficient post-training methods for large language models.

preprint2024arXiv

Degenerate bifurcations of two-fold doubly-connected uniformly rotating vortex patches

In this paper, we obtain families of two-fold doubly-connected uniformly rotating vortex patches of the 2-D incompressible Euler equations emanating from some specific annuli. The main difficulty comes from strong degeneracy of the problem, neither the kernel of linearization is one-dimensional nor the transeversallity condition holds. To this end, we make a detailed analysis on the nonlinear functional and the bifurcation curves are obtained by perturbing real algebraic varieties defined by truncated polynomials. In addition, our result partially answers an problem proposed by Hmidi and Mateu in \cite{Hmidi2016a} (\emph{Adv.Math.302 (2016), 799-850}).

preprint2022arXiv

Laboratory development of a heterodyne interferometric system for translation and tilt measurement of the proof mass in the space gravitational wave detection

Laser heterodyne interferometry plays a key role in the proof mass's monitor and control by measuring its multiple degrees of freedom motions in the Space Gravitational Wave Detection. Laboratory development of polarization-multiplexing heterodyne interferometer (PMHI) using quadrant photodetectors (QPD) is presented in this paper, intended for measuring the translation and tilt of a proof mass. The system is of symmetric design, which can expand to five degrees of freedom measurements based on polarization-multiplexing and differential wavefront sensing (DWS). The ground-simulated experimental results demonstrate that a measurement noise of 3 pm/Hz$^{1/2}$ and 2 nrad/Hz$^{1/2}$ at 1 Hz have been achieved respectively. The tilt-to-length error is dominated by geometric misalignment for the current system, the coupling of which is at micrometer level within a tilt range of 1000 μrad.

preprint2022arXiv

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for the advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

preprint2022arXiv

Noncontact measurement method of linear and angular displacement based on dual-beam feedback interferometric system

This study describes a unique optical approach for the noncontact measurement of linear and angular displacement. Compared to previous methods, the sensor system here based on the dual-beam phase-modulated feedback interferometry provides higher sensitivity for non-cooperative targets and a wider range concerning the angle measurement. The amount of linear and angular displacement is calculated by tracing the phase changes of the differential beams. Performance of the proposed method is evaluated via testing a prototype system. The prototype has a 35 nm and 0.15" stability over 1 hour, with a resolution of 1 nm and 0.02" correspondingly, according to the experimental data. The linearity is 5.58*10^{-6} in the range of 100 mm and 1.34*10^{-4} in the range of 360°, indicating that the proposed method may possess considerable potential for high-precision metrological applications.

preprint2022arXiv

Polarization effects on fluorescence emission of zebrafish neurons using light-sheet microscopy

Light-sheet fluorescence microscopy (LSFM) makes use of a thin plane of light to optically section and image transparent tissues or organisms {\it{in vivo}}, which has the advantages of fast imaging speed and low phototoxicity. In this paper, we have employed light-sheet microscopy to investigate the polarization effects on fluorescence emission of zebrafish neurons via modifying the electric oscillation orientation of the excitation light. The intensity of the fluorescence emission from the excited zebrafish larvae follows a cosine square function with respect to the polarization state of the excitation light and reveals a 40$\%$ higher fluorescence emission when the polarization orientation is orthogonal to the illumination and detection axes. Through registration and subtraction of fluorescence images under different polarization states, we have demonstrated that most of the enhanced fluorescence signals are from the nerve cells rather than the extracellular substance. This provides us a way to distinguish the cell boundaries and observe the organism structures with improved contrast and resolution.

preprint2022arXiv

Robust Learning-based Predictive Control for Discrete-time Nonlinear Systems with Unknown Dynamics and State Constraints

Robust model predictive control (MPC) is a well-known control technique for model-based control with constraints and uncertainties. In classic robust tube-based MPC approaches, an open-loop control sequence is computed via periodically solving an online nominal MPC problem, which requires prior model information and frequent access to onboard computational resources. In this paper, we propose an efficient robust MPC solution based on receding horizon reinforcement learning, called r-LPC, for unknown nonlinear systems with state constraints and disturbances. The proposed r-LPC utilizes a Koopman operator-based prediction model obtained off-line from pre-collected input-output datasets. Unlike classic tube-based MPC, in each prediction time interval of r-LPC, we use an actor-critic structure to learn a near-optimal feedback control policy rather than a control sequence. The resulting closed-loop control policy can be learned off-line and deployed online or learned online in an asynchronous way. In the latter case, online learning can be activated whenever necessary; for instance, the safety constraint is violated with the deployed policy. The closed-loop recursive feasibility, robustness, and asymptotic stability are proven under function approximation errors of the actor-critic networks. Simulation and experimental results on two nonlinear systems with unknown dynamics and disturbances have demonstrated that our approach has better or comparable performance when compared with tube-based MPC and LQR, and outperforms a recently developed actor-critic learning approach.

preprint2022arXiv

Robust Tube-based Model Predictive Control with Koopman Operators--Extended Version

Koopman operators are of infinite dimension and capture the characteristics of nonlinear dynamics in a lifted global linear manner. The finite data-driven approximation of Koopman operators results in a class of linear predictors, useful for formulating linear model predictive control (MPC) of nonlinear dynamical systems with reduced computational complexity. However, the robustness of the closed-loop Koopman MPC under modeling approximation errors and possible exogenous disturbances is still a crucial issue to be resolved. Aiming at the above problem, this paper presents a robust tube-based MPC solution with Koopman operators, i.e., r-KMPC, for nonlinear discrete-time dynamical systems with additive disturbances. The proposed controller is composed of a nominal MPC using a lifted Koopman model and an off-line nonlinear feedback policy. The proposed approach does not assume the convergence of the approximated Koopman operator, which allows using a Koopman model with a limited order for controller design. Fundamental properties, e.g., stabilizability, observability, of the Koopman model are derived under standard assumptions with which, the closed-loop robustness and nominal point-wise convergence are proven. Simulated examples are illustrated to verify the effectiveness of the proposed approach.

preprint2022arXiv

Singular Limits for the Navier-Stokes-Poisson Equations of Viscous Plasma with Strong Density Boundary Layer

The quasi-neutral limit of the Navier-Stokes-Poisson system modeling a viscous plasma with vanishing viscosity coefficients in the half-space $\mathbb{R}^{3}_{+}$ is rigorously proved under a Navier-slip boundary condition for velocity and the Dirichlet boundary condition for electric potential. This is achieved by establishing the nonlinear stability of the approximation solutions involving the strong boundary layer in density and electric potential, which comes from the break-down of the quasi-neutrality near the boundary, and dealing with the difficulty of the interaction of this strong boundary layer with the weak boundary layer of the velocity field.

preprint2022arXiv

Spurious currents suppression by accurate difference schemes in multiphase lattice Boltzmann method

Spurious currents, which are often observed near a curved interface in the multiphase simulations by diffuse interface methods, are unphysical phenomena and usually damage the computational accuracy and stability. In this paper, the origination and suppression of spurious currents are investigated by using the multiphase lattice Boltzmann method driven by chemical potential. Both the difference error and insufficient isotropy of discrete gradient operator give rise to the directional deviations of nonideal force and then originate the spurious currents. Nevertheless, the high-order finite difference produces far more accurate results than the high-order isotropic difference. We compare several finite difference schemes which have different formal accuracy and resolution. When a large proportional coefficient is used, the transition region is narrow and steep, and the resolution of finite difference indicates the computational accuracy more exactly than the formal accuracy. On the contrary, for a small proportional coefficient, the transition region is wide and gentle, and the formal accuracy of finite difference indicates the computational accuracy better than the resolution. Furthermore, numerical simulations show that the spurious currents calculated in the 3D situation are highly consistent with those in 2D simulations; especially, the two-phase coexistence densities calculated by the high-order accuracy finite difference are in excellent agreement with the theoretical predictions of the Maxwell equal-area construction till the reduced temperature 0.2.

preprint2022arXiv

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Mandarin meeting speech data with manual annotation, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.

preprint2022arXiv

Towards Generalizable Person Re-identification with a Bi-stream Generative Model

Generalizable person re-identification (re-ID) has attracted growing attention due to its powerful adaptation capability in the unseen data domain. However, existing solutions often neglect either crossing cameras (e.g., illumination and resolution differences) or pedestrian misalignments (e.g., viewpoint and pose discrepancies), which easily leads to poor generalization capability when adapted to the new domain. In this paper, we formulate these difficulties as: 1) Camera-Camera (CC) problem, which denotes the various human appearance changes caused by different cameras; 2) Camera-Person (CP) problem, which indicates the pedestrian misalignments caused by the same identity person under different camera viewpoints or changing pose. To solve the above issues, we propose a Bi-stream Generative Model (BGM) to learn the fine-grained representations fused with camera-invariant global feature and pedestrian-aligned local feature, which contains an encoding network and two stream decoding sub-networks. Guided by original pedestrian images, one stream is employed to learn a camera-invariant global feature for the CC problem via filtering cross-camera interference factors. For the CP problem, another stream learns a pedestrian-aligned local feature for pedestrian alignment using information-complete densely semantically aligned part maps. Moreover, a part-weighted loss function is presented to reduce the influence of missing parts on pedestrian alignment. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on the large-scale generalizable re-ID benchmarks, involving domain generalization setting and cross-domain setting.

preprint2022arXiv

Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network

Although monocular 3D human pose estimation methods have made significant progress, it is far from being solved due to the inherent depth ambiguity. Instead, exploiting multi-view information is a practical way to achieve absolute 3D human pose estimation. In this paper, we propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation. By only using two camera views, our method can achieve state-of-the-art performance in a weakly-supervised manner, requiring no 3D ground truth but only 2D annotations. Specifically, our method contains two steps: triangulation and refinement. First, given the 2D keypoints that can be obtained through any classic 2D detection methods, triangulation is performed across two views to lift the 2D keypoints into coarse 3D poses. Then, a novel cross-view U-shaped graph convolutional network (CV-UGCN), which can explore the spatial configurations and cross-view correlations, is designed to refine the coarse 3D poses. In particular, the refinement progress is achieved through weakly-supervised learning, in which geometric and structure-aware consistency checks are performed. We evaluate our method on the standard benchmark dataset, Human3.6M. The Mean Per Joint Position Error on the benchmark dataset is 27.4 mm, which outperforms existing state-of-the-art methods remarkably (27.4 mm vs 30.2 mm).

preprint2022arXiv

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions, while a high-quality ASR transcription system is used to generate audio/text pair candidates for the Podcast data. Then we propose a novel end-to-end label error detection approach to further validate and filter the candidates. We also provide three manually labelled high-quality test sets along with WenetSpeech for evaluation -- Dev for cross-validation purpose in training, Test_Net, collected from Internet for matched test, and Test\_Meeting, recorded from real meetings for more challenging mismatched test. Baseline systems trained with WenetSpeech are provided for three popular speech recognition toolkits, namely Kaldi, ESPnet, and WeNet, and recognition results on the three test sets are also provided as benchmarks. To the best of our knowledge, WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition.

Xin Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Degenerate bifurcations of two-fold doubly-connected uniformly rotating vortex patches

Laboratory development of a heterodyne interferometric system for translation and tilt measurement of the proof mass in the space gravitational wave detection

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Noncontact measurement method of linear and angular displacement based on dual-beam feedback interferometric system

Polarization effects on fluorescence emission of zebrafish neurons using light-sheet microscopy

Robust Learning-based Predictive Control for Discrete-time Nonlinear Systems with Unknown Dynamics and State Constraints

Robust Tube-based Model Predictive Control with Koopman Operators--Extended Version

Singular Limits for the Navier-Stokes-Poisson Equations of Viscous Plasma with Strong Density Boundary Layer

Spurious currents suppression by accurate difference schemes in multiphase lattice Boltzmann method

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

Towards Generalizable Person Re-identification with a Bi-stream Generative Model

Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Can speed up the convergence rate of stochastic gradient methods to $\mathcal{O}(1/k^2)$ by a gradient averaging strategy?

Coordinated Path Following Control of Fixed-wing Unmanned Aerial Vehicles

Derivative-free global minimization for a class of multiple minima problems

Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

Exploring Image Enhancement for Salient Object Detection in Low Light Images

Generalized Equal Area Criterion for Stability Analysis of Nonlinear Oscillators

LSTM Networks for Music Generation

Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix

Person Re-Identification via Active Hard Sample Mining

Planning and Operations of Mixed Fleets in Mobility-on-Demand Systems

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

RTFN: A Robust Temporal Feature Network for Time Series Classification

RTFN: Robust Temporal Feature Network

Stochastic gradient-free descents

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

Towards Critical Clearing Time Sensitivity for DAE Systems with Singularity

BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Visual Saliency Based on Scale-Space Analysis in the Frequency Domain

Robust frequency offset estimator for OFDM over fast varying multipath channel

Beyond Random Walk and Metropolis-Hastings Samplers: Why You Should Not Backtrack for Unbiased Graph Sampling

Generic Extensional Framework for the Memristive Systems