Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
25works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

25 published item(s)

preprint2026arXiv

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Modern information retrieval (IR) is no longer consumed primarily by humans but increasingly by large language models (LLMs) via retrieval-augmented generation (RAG) and agentic search. Unlike human users, LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise; misleading or irrelevant information is no longer just a nuisance, but a direct cause of hallucinations and reasoning failures. In this perspective paper, we argue that denoising-maximizing usable evidence density and verifiability within a context window-is becoming the primary bottleneck across the full information access pipeline. We conceptualize this paradigm shift through a four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. Furthermore, we provide a pipeline-organized taxonomy of signal-to-noise optimization techniques, spanning indexing, retrieval, context engineering, verification, and agentic workflow. We also present research works on information denoising in domains that rely heavily on retrieval such as lifelong assistant, coding agent, deep research, and multimodal understanding.

preprint2024arXiv

Enhancement of cell membrane poration by the antimicrobial peptide Melp5

Melittin, a natural antimicrobial peptide comprising 26 amino acid residues, can kill bacteria by inducing pores in cell membranes. Clinical applications of melittin as an antibiotic require a thorough understanding of its poration mechanism and mutations that enhance its antimicrobial activity. Previous experiments showed Melp5, a variant of melittin with five mutations, exhibits a higher poration ability. However, the mechanism of the enhanced poration ability is not fully understood. Here, we investigated the mechanism by comparing the poration of melittin and Melp5 using coarse-grained (CG) and all-atom (AA) molecular dynamics (MD) simulations. We observe that Melp5 is likely to form a pore with 5 peptides (pentameric), while melittin is likely to form a pore with 4 peptides (tetrameric). Our atomistic MD simulations show that the pentameric pore of Melp5 has a higher water permeability than the tetrameric pore of melittin. We also analyze the stability of the pores of melittin and Melp5 by calculating the interaction energies of the pores. In particular, we investigate the effects of mutant residues on pore stability by calculating electrostatic and LJ interactions. These results should provide insights on the enhanced poration ability of Melp5 and push it toward applications.

preprint2023arXiv

Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

The goal of sequential event prediction is to estimate the next event based on a sequence of historical events, with applications to sequential recommendation, user behavior analysis and clinical treatment. In practice, the next-event prediction models are trained with sequential data collected at one time and need to generalize to newly arrived sequences in remote future, which requires models to handle temporal distribution shift from training to testing. In this paper, we first take a data-generating perspective to reveal a negative result that existing approaches with maximum likelihood estimation would fail for distribution shift due to the latent context confounder, i.e., the common cause for the historical events and the next event. Then we devise a new learning objective based on backdoor adjustment and further harness variational inference to make it tractable for sequence learning problems. On top of that, we propose a framework with hierarchical branching structures for learning context-specific representations. Comprehensive experiments on diverse tasks (e.g., sequential recommendation) demonstrate the effectiveness, applicability and scalability of our method with various off-the-shelf models as backbones.

preprint2022arXiv

Baryonic $B$ meson decays

We review the two and three-body baryonic $B$ decays with the dibaryon (${\bf B\bar B'}$) as the final states. Accordingly, we summarize the experimental data of the branching fractions, angular asymmetries, and $CP$ asymmetries. Using the $W$-boson annihilation (exchange) mechanism, the branching fractions of $B\to {\bf B \bf \bar B'}$ are shown to be interpretable. In the approach of perturbative QCD counting rules, we study the three-body decay channels. In particular, we review the $CP$ asymmetries of $B\to {\bf B\bar B'}M$, which are promising to be measured by the LHCb and Belle~II experiments. Finally, we remark the theoretical challenges in interpreting ${\cal B}(B^-\to p\bar pρ^-)$ and ${\cal B}(B^-\to p\bar pμ^-\bar ν_μ)$.

preprint2022arXiv

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Although Transformer-based methods have significantly improved state-of-the-art results for long-term series forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer ({\bf FEDformer}), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, FEDformer can reduce prediction error by $14.8\%$ and $22.6\%$ for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.

preprint2022arXiv

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach

Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this paper, we consider a submodular optimization based approach for learning rule sets. The learning problem is framed as a subset selection task in which a subset of all possible rules needs to be selected to form an accurate and interpretable rule set. We employ an objective function that exhibits submodularity and thus is amenable to submodular optimization techniques. To overcome the difficulty arose from dealing with the exponential-sized ground set of rules, the subproblem of searching a rule is casted as another subset selection task that asks for a subset of features. We show it is possible to write the induced objective function for the subproblem as a difference of two submodular (DS) functions to make it approximately solvable by DS optimization algorithms. Overall, the proposed approach is simple, scalable, and likely to be benefited from further research on submodular optimization. Experiments on real datasets demonstrate the effectiveness of our method.

preprint2022arXiv

Lieb lattices formed by real atoms on Ag(111) and their lattice constant dependent electronic properties

Scanning tunneling microscopy is a powerful tool to build artificial atomic structures even not exist in nature but possess exotic properties. We here constructed Lieb lattices with different lattice constants by real atoms, i.e., Fe atoms on Ag(111) and probed their electronic properties. We find a surprising long-range effective electron wavefunction overlap between Fe adatoms as it exhibits a 1/r2-dependence with the interatomic distance r instead of the theoretically predicted exponential one. Combining control experiments, tight-binding and Green's function calculations, we attribute the observed long-range overlap to be enabled by the surface state. Our findings not only enrich the understanding of the electron wavefunction overlap, but also provide a convenient platform to design and explore the artificial structures and future devices with real atoms.

preprint2022arXiv

NetRCA: An Effective Network Fault Cause Localization Algorithm

Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from the original raw data by considering temporal, directional, attribution, and interaction characteristics. Secondly, we adopt multivariate time series similarity and label propagation to generate new training data from both labeled and unlabeled data to overcome the lack of labeled samples. Thirdly, we design an ensemble model which combines XGBoost, rule set learning, attribution model, and graph algorithm, to fully utilize all data information and enhance performance. Finally, experiments and analysis are conducted on the real-world dataset from ICASSP 2022 AIOps Challenge to demonstrate the superiority and effectiveness of our approach.

preprint2022arXiv

Planar Hall effect induced spin rectification effect and its strong impact on spin pumping measurements

Spin pumping is a technique widely used to generate the pure spin current and characterize the spin-charge conversion in various systems. The reversing sign of the symmetric Lorentzian charge current with respect to opposite magnetic field is generally accepted as the key criterion to identify its pure spin current origin. However, we herein find that the rectified voltage due to the planar Hall effect can exhibit similar spurious signal, complicating and even misleading the analysis. The distribution of microwave magnetic field and induction current has strong influence on the magnetic field symmetry and lineshape of the obtained signal. We further demonstrate a geometry where the spin-charge conversion and the rectified voltage can be readily distinguished with a straightforward symmetry analysis.

preprint2022arXiv

Quasi-periodic oscillations of the X-ray burst from the magnetar SGR J1935+2154 and associated with the fast radio burst FRB 200428

The origin(s) and mechanism(s) of fast radio bursts (FRBs), which are short radio pulses from cosmological distances, have remained a major puzzle since their discovery. We report a strong Quasi-Periodic Oscillation(QPO) of 40 Hz in the X-ray burst from the magnetar SGR J1935+2154 and associated with FRB 200428, significantly detected with the Hard X-ray Modulation Telescope (Insight-HXMT) and also hinted by the Konus-Wind data. QPOs from magnetar bursts have only been rarely detected; our 3.4 sigma (p-value is 2.9e-4) detection of the QPO reported here reveals the strongest QPO signal observed from magnetars (except in some very rare giant flares), making this X-ray burst unique among magnetar bursts. The two X-ray spikes coinciding with the two FRB pulses are also among the peaks of the QPO. Our results suggest that at least some FRBs are related to strong oscillation processes of neutron stars. We also show that we may overestimate the significance of the QPO signal and underestimate the errors of QPO parameters if QPO exists only in a fraction of the time series of a X-ray burst which we use to calculate the Leahy-normalized periodogram.

preprint2022arXiv

Robust Time Series Dissimilarity Measure for Outlier Detection and Periodicity Detection

Dynamic time warping (DTW) is an effective dissimilarity measure in many time series applications. Despite its popularity, it is prone to noises and outliers, which leads to singularity problem and bias in the measurement. The time complexity of DTW is quadratic to the length of time series, making it inapplicable in real-time applications. In this paper, we propose a novel time series dissimilarity measure named RobustDTW to reduce the effects of noises and outliers. Specifically, the RobustDTW estimates the trend and optimizes the time warp in an alternating manner by utilizing our designed temporal graph trend filtering. To improve efficiency, we propose a multi-level framework that estimates the trend and the warp function at a lower resolution, and then repeatedly refines them at a higher resolution. Based on the proposed RobustDTW, we further extend it to periodicity detection and outlier time series detection. Experiments on real-world datasets demonstrate the superior performance of RobustDTW compared to DTW variants in both outlier time series detection and periodicity detection.

preprint2022arXiv

RobustScaler: QoS-Aware Autoscaling for Complex Workloads

Autoscaling is a critical component for efficient resource utilization with satisfactory quality of service (QoS) in cloud computing. This paper investigates proactive autoscaling for widely-used scaling-per-query applications where scaling is required for each query, such as container registry and function-as-a-service (FaaS). In these scenarios, the workload often exhibits high uncertainty with complex temporal patterns like periodicity, noises and outliers. Conservative strategies that scale out unnecessarily many instances lead to high resource costs whereas aggressive strategies may result in poor QoS. We present RobustScaler to achieve superior trade-off between cost and QoS. Specifically, we design a novel autoscaling framework based on non-homogeneous Poisson processes (NHPP) modeling and stochastically constrained optimization. Furthermore, we develop a specialized alternating direction method of multipliers (ADMM) to efficiently train the NHPP model, and rigorously prove the QoS guarantees delivered by our optimization-based proactive strategies. Extensive experiments show that RobustScaler outperforms common baseline autoscaling strategies in various real-world traces, with large margins for complex workload patterns.

preprint2022arXiv

Time Series Data Augmentation for Deep Learning: A Survey

Deep learning performs remarkably well on many time series analysis tasks recently. The superior performance of deep neural networks relies heavily on a large number of training data to avoid overfitting. However, the labeled data of many real-world time series applications may be limited such as classification in medical time series and anomaly detection in AIOps. As an effective way to enhance the size and quality of the training data, data augmentation is crucial to the successful application of deep learning models on time series data. In this paper, we systematically review different data augmentation methods for time series. We propose a taxonomy for the reviewed methods, and then provide a structured review for these methods by highlighting their strengths and limitations. We also empirically compare different data augmentation methods for different tasks including time series classification, anomaly detection, and forecasting. Finally, we discuss and highlight five future directions to provide useful research guidance.

preprint2022arXiv

TreeDRNet:A Robust Deep Model for Long Term Time Series Forecasting

Various deep learning models, especially some latest Transformer-based approaches, have greatly improved the state-of-art performance for long-term time series forecasting.However, those transformer-based models suffer a severe deterioration performance with prolonged input length, which prohibits them from using extended historical info.Moreover, these methods tend to handle complex examples in long-term forecasting with increased model complexity, which often leads to a significant increase in computation and less robustness in performance(e.g., overfitting). We propose a novel neural network architecture, called TreeDRNet, for more effective long-term forecasting. Inspired by robust regression, we introduce doubly residual link structure to make prediction more robust.Built upon Kolmogorov-Arnold representation theorem, we explicitly introduce feature selection, model ensemble, and a tree structure to further utilize the extended input sequence, which improves the robustness and representation power of TreeDRNet. Unlike previous deep models for sequential forecasting work, TreeDRNet is built entirely on multilayer perceptron and thus enjoys high computational efficiency. Our extensive empirical studies show that TreeDRNet is significantly more effective than state-of-the-art methods, reducing prediction errors by 20% to 40% for multivariate time series. In particular, TreeDRNet is over 10 times more efficient than transformer-based methods. The code will be released soon.

preprint2021arXiv

RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicity Detection

Periodicity detection is a crucial step in time series tasks, including monitoring and forecasting of metrics in many areas, such as IoT applications and self-driving database management system. In many of these applications, multiple periodic components exist and are often interlaced with each other. Such dynamic and complicated periodic patterns make the accurate periodicity detection difficult. In addition, other components in the time series, such as trend, outliers and noises, also pose additional challenges for accurate periodicity detection. In this paper, we propose a robust and general framework for multiple periodicity detection. Our algorithm applies maximal overlap discrete wavelet transform to transform the time series into multiple temporal-frequency scales such that different periodic components can be isolated. We rank them by wavelet variance, and then at each scale detect single periodicity by our proposed Huber-periodogram and Huber-ACF robustly. We rigorously prove the theoretical properties of Huber-periodogram and justify the use of Fisher's test on Huber-periodogram for periodicity detection. To further refine the detected periods, we compute unbiased autocorrelation function based on Wiener-Khinchin theorem from Huber-periodogram for improved robustness and efficiency. Experiments on synthetic and real-world datasets show that our algorithm outperforms other popular ones for both single and multiple periodicity detection.

preprint2020arXiv

Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT

Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19 classification based on chest CT images. Specifically, we first extract location-specific features from CT images. Then, in order to capture the high-level representation of these features with the relatively small-scale data, we leverage a deep forest model to learn high-level representation of the features. Moreover, we propose a feature selection method based on the trained deep forest model to reduce the redundancy of features, where the feature selection could be adaptively incorporated with the COVID-19 classification model. We evaluated our proposed AFS-DF on COVID-19 dataset with 1495 patients of COVID-19 and 1027 patients of community acquired pneumonia (CAP). The accuracy (ACC), sensitivity (SEN), specificity (SPE) and AUC achieved by our method are 91.79%, 93.05%, 89.95% and 96.35%, respectively. Experimental results on the COVID-19 dataset suggest that the proposed AFS-DF achieves superior performance in COVID-19 vs. CAP classification, compared with 4 widely used machine learning methods.

preprint2020arXiv

Discovery of oscillations above 200 keV in a black hole X-ray binary with Insight-HXMT

Low-frequency quasi-periodic oscillations (LFQPOs) are commonly found in black hole X-ray binaries, and their origin is still under debate. The properties of LFQPOs at high energies (above 30 keV) are closely related to the nature of the accretion flow in the innermost regions, and thus play a crucial role in critically testing various theoretical models. The Hard X-ray Modulation Telescope (Insight-HXMT) is capable of detecting emissions above 30 keV, and is therefore an ideal instrument to do so. Here we report the discovery of LFQPOs above 200 keV in the new black hole MAXI J1820+070 in the X-ray hard state, which allows us to understand the behaviours of LFQPOs at hundreds of kiloelectronvolts. The phase lag of the LFQPO is constant around zero below 30 keV, and becomes a soft lag (that is, the high-energy photons arrive first) above 30 keV. The soft lag gradually increases with energy and reaches ~0.9s in the 150-200 keV band. The detection at energies above 200 keV, the large soft lag and the energy-related behaviors of the LFQPO pose a great challenge for most currently existing models, but suggest that the LFQPO probably originates from the precession of a small-scale jet.

preprint2020arXiv

Multi-Sound-Source Localization Using Machine Learning for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone Array

Abstract While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capabilities have not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform three-dimensional (3D) multi-sound-sources localization (MSSL) using only the inter-channel time difference (ICTD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on two machine learning techniques viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant environment.

preprint2020arXiv

The Background Model of the Medium Energy X-ray telescope of Insight-HXMT

The Medium Energy X-ray Telescope (ME) is one of the main payloads of the Hard X-ray Modulation Telescope (dubbed as Insight-HXMT). The background of Insight-HXMT/ME is mainly caused by the environmental charged particles and the background intensity is modulated remarkably by the geomagnetic field, as well as the geographical location. At the same geographical location, the background spectral shape is stable but the intensity varies with the level of the environmental charged particles. In this paper, we develop a model to estimate the ME background based on the ME database that is established with the two-year blank sky observations of the high Galactic latitude. In this model, the entire geographical area covered by Insight-HXMT is divided into grids of $5^{\circ}\times5^{\circ}$ in geographical coordinate system. For each grid, the background spectral shape can be obtained from the background database and the intensity can be corrected by the contemporary count rate of the blind FOV detectors. Thus the background spectrum can be obtained by accumulating the background of all the grids passed by Insight-HXMT during the effective observational time. The model test with the blank sky observations shows that the systematic error of the background estimation in $8.9-44.0$ keV is $\sim1.3\%$ for a pointing observation with an average exposure $\sim5.5$ ks. We also find that the systematic error is anti-correlated with the exposure, which indicates the systematic error is partly contributed by the statistical error of count rate measured by the blind FOV detectors.

preprint2019arXiv

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

As China's first X-ray astronomical satellite, the Hard X-ray Modulation Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15, 2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was designed to perform pointing, scanning and gamma-ray burst (GRB) observations and, based on the Direct Demodulation Method (DDM), the image of the scanned sky region can be reconstructed. Here we give an overview of the mission and its progresses, including payload, core sciences, ground calibration/facility, ground segment, data archive, software, in-orbit performance, calibration, background model, observations and some preliminary results.

preprint2019arXiv

Performance tests of boron-coated straw detectors with thermal and cold neutron beams

Prototypes of newly developed boron-coated straw (BCS) detectors have been tested in the thermal and cold neutron energy ranges. Their neutron detection performance has been benchmarked against the industry standard (detector tubes filled with 3He gas). The tests show that the BCS straws perform near their theoretical limit regarding the detection efficiency, which is adequate for scientific instruments in the cold neutron energy range. The BCS detectors perform on par with 3He tubes in terms of signal to noise and timing resolution, and superior regarding longitudinal spatial resolution.

preprint2019arXiv

The Medium Energy (ME) X-ray telescope onboard the Insight-HXMT astronomy satellite

The Medium Energy X-ray telescope (ME) is one of the three main telescopes on board the Insight Hard X-ray Modulation Telescope (Insight-HXMT) astronomy satellite. ME contains 1728 pixels of Si-PIN detectors sensitive in 5-30 keV with a total geometrical area of 952 cm2. Application Specific Integrated Circuit (ASIC) chips, VA32TA6, is used to achieve low power consumption and low readout noise. The collimators define three kinds of field of views (FOVs) for the telescope, 1°{\times}4°, 4°{\times}4°, and blocked ones. Combination of such FOVs can be used to estimate the in-orbit X-ray and particle background components. The energy resolution of ME is ~3 keV at 17.8 keV (FWHM) and the time resolution is 255 μs. In this paper, we introduce the design and performance of ME.

preprint2010arXiv

Strong enhancement of chlorophyll a concentration by a weak typhoon

Recent studies demonstrate that chlorophyll a (chl a) concentrations in the surface ocean can be significantly enhanced due to typhoons. The present study investigated chl a concentrations in the middle of the South China Sea (SCS) from 1997-2007. Only the Category1 (minimal) Typhoon Hagibis (2007) had a notable effect on the chl a concentrations. Typhoon Hagibis had a strong upwelling potential due to its location near the equator, and the forcing time of the typhoon (>82 h) was much longer than the geostrophic adjustment time (~63 h). The higher upwelling velocity and the longer forcing time increased the depth of the mixed-layer, which consequently induced a strong phytoplankton bloom that accounted for about 30% of the total annual chl a concentration in the middle of the SCS. The implication is that the forcing time of a typhoon should be long enough to establish a strong upwelling and consequently for the induction of significant upper ocean responses.

preprint2009arXiv

Impacts of Typhoons on the Kuroshio Large Meander: Observation Evidences

The formation of the Kuroshio large meander in summer 2004 was investigated by using the cruise data, Argo profiles data, and satellite remote sensing data. We validated the point that cyclonic eddy contributes to the large meander. Besides, the impacts of typhoons on Kuroshio meanders were studied. From 29 July to 4 August, the typhoons stirred the ocean and upwelled the deep water, which enhanced the existed cyclonic eddy, and immediately made a drastic meander of the Kuroshio. Moreover, the unexpected typhoons in June 2004 also contributed to the initial meander at Tokara Strait. The result suggests an alternative meander mechanism of Kuroshio path via typhoon-eddy-Kuroshio interactions. It is argued that typhoons accompanied with cyclonic eddies, might play crucial roles in meanders of the Kuroshio. This will provide a more comprehensive understanding of the dynamics of the west boundary flows like the Kuroshio and the Gulf Stream, and will be useful in eddy-resolution models.

preprint2006arXiv

General stability criterion of inviscid parallel flow

A more restrictively general stability criterion of two-dimensional inviscid parallel flow is obtained analytically. First, a sufficient criterion for stability is found as either $-μ_1<\frac{U&#39;&#39;}{U-U_s}<0$ or $0<\frac{U&#39;&#39;}{U-U_s}$ in the flow, where $U_s$ is the velocity at inflection point, $μ_1$ is the eigenvalue of Poincaré&#39;s problem. Second, this criterion is generalized to barotropic geophysical flows in $β$ plane. Based on the criteria, the flows are are divided into different categories of stable flows, which may simplify the further investigations. And the connections between present criteria and Arnol&#39;d&#39;s nonlinear criteria are discussed. These results extend the former criteria obtained by Rayleigh, Tollmien and Fjørtoft and would intrigue future research on the mechanism of hydrodynamic instability.