Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
49works
0followers
31topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

49 published item(s)

preprint2026arXiv

Intelligent Nano-Fingerprinting: An Efficient and Precise Approach for Liquid Biopsy

Biological matrices are rich in information related to life processes, serving as invaluable media for assessing an individual's overall physiological status and its dynamic fluctuations, as well as crucial foundations for disease diagnosis. However, the inherent complexity of these matrices, coupled with our incomplete understanding of their full composition, presents significant challenges for comprehensive analysis and accurate diagnostic interpretation. The advent of single-molecule technologies has revolutionized biomedical research, enabling the direct observation of life processes at the molecular scale. We have proposed an Intelligent Nano-Fingerprinting strategy based on single-molecule nanopore technology, designed to capture the global molecular fingerprints of complex plasma matrices. Furthermore, we developed an intelligent algorithmic model capable of achieving precise classification of plasma samples. This approach is characterized by its simplicity, efficiency, and considerable potential for large-scale adoption and transferable applications.

preprint2026arXiv

Non-volatile Programmable Photonic Integrated Circuits using Mechanically Latched MEMS: A System-Level Scheme Enabling Power-Connection-Free Operation Without Performance Compromise

Programmable photonic integrated circuits (PPICs) offer a versatile platform for implementing diverse optical functions on a generic hardware mesh. However, the scalability of PPICs faces critical power consumption barriers. Therefore, we propose a novel non-volatile PPIC architecture utilizing MEMS with mechanical latching, enabling stable passive operation without any power connection once configured. To ensure practical applicability, we present a system-level solution including both this hardware innovation and an accompanying automatic error-resilient configuration algorithm. The algorithm compensates for the lack of continuous tunability inherent in the non-volatile hardware design, thereby enabling such new operational paradigm without compromising performance, and also ensuring robustness against fabrication errors. Functional simulations were performed to validate the proposed scheme by configuring five distinct functionalities of varying complexity, including a Mach-Zehnder interferometer (MZI), a MZI lattice filter, a ring resonator (ORR), a double ORR ring-loaded MZI, and a triple ORR coupled resonator waveguide filter. The results demonstrate that our non-volatile scheme achieves performance equivalent to conventional PPICs. Robustness analysis was also conducted, and the results demonstrated that our scheme exhibits strong robustness against various fabrication errors. Furthermore, we explored the trade-off between the hardware design complexity of such non-volatile scheme and its performance. This study establishes a viable pathway to a new generation of power-connection-free PPICs, providing a practical and scalable solution for future photonic systems.

preprint2026arXiv

Parallel Latent Reasoning for Sequential Recommendation

Capturing complex user preferences from sparse behavioral sequences remains a fundamental challenge in sequential recommendation. Recent latent reasoning methods have shown promise by extending test-time computation through multi-step reasoning, yet they exclusively rely on depth-level scaling along a single trajectory, suffering from diminishing returns as reasoning depth increases. To address this limitation, we propose \textbf{Parallel Latent Reasoning (PLR)}, a novel framework that pioneers width-level computational scaling by exploring multiple diverse reasoning trajectories simultaneously. PLR constructs parallel reasoning streams through learnable trigger tokens in continuous latent space, preserves diversity across streams via global reasoning regularization, and adaptively synthesizes multi-stream outputs through mixture-of-reasoning-streams aggregation. Extensive experiments on three real-world datasets demonstrate that PLR substantially outperforms state-of-the-art baselines while maintaining real-time inference efficiency. Theoretical analysis further validates the effectiveness of parallel reasoning in improving generalization capability. Our work opens new avenues for enhancing reasoning capacity in sequential recommendation beyond existing depth scaling.

preprint2024arXiv

Tracking Surface Charge Dynamics on Single Nanoparticles

Surface charges play a fundamental role in physics and chemistry, particularly in shaping the catalytic properties of nanomaterials. Tracking nanoscale surface charge dynamics remains challenging due to the involved length and time scales. Here, we demonstrate real-time access to the nanoscale charge dynamics on dielectric nanoparticles employing reaction nanoscopy. We present a four-dimensional visualization of the non-linear charge dynamics on strong-field irradiated single SiO$_2$ nanoparticles with femtosecond-nanometer resolution and reveal how surface charges affect surface molecular bonding with quantum dynamical simulations. We performed semi-classical simulations to uncover the roles of diffusion and charge loss in the surface charge redistribution process. Understanding nanoscale surface charge dynamics and its influence on chemical bonding on a single nanoparticle level unlocks an increased ability to address global needs in renewable energy and advanced healthcare.

preprint2023arXiv

RL-GA: A Reinforcement Learning-Based Genetic Algorithm for Electromagnetic Detection Satellite Scheduling Problem

The study of electromagnetic detection satellite scheduling problem (EDSSP) has attracted attention due to the detection requirements for a large number of targets. This paper proposes a mixed-integer programming model for the EDSSP problem and a genetic algorithm based on reinforcement learning (RL-GA). Numerous factors that affect electromagnetic detection are considered in the model, such as detection mode, bandwidth, and other factors. The RL-GA embeds a Q-learning method into an improved genetic algorithm, and the evolution of each individual depends on the decision of the agent. Q-learning is used to guide the population search process by choosing evolution operators. In this way, the search information can be effectively used by the reinforcement learning method. In the algorithm, we design a reward function to update the Q value. According to the problem characteristics, a new combination of <state, action> is proposed. The RL-GA also uses an elite individual retention strategy to improve search performance. After that, a task time window selection algorithm (TTWSA) is proposed to evaluate the performance of population evolution. Several experiments are used to examine the scheduling effect of the proposed algorithm. Through the experimental verification of multiple instances, it can be seen that the RL-GA can solve the EDSSP problem effectively. Compared with the state-of-the-art algorithms, the RL-GA performs better in several aspects.

preprint2022arXiv

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis. An important step in detection, tracking, monitoring, feature extraction, modeling and analysis is image segmentation, in which U-Net has increasingly applied in microscopic image segmentation. This paper comprehensively reviews the development history of U-Net, and analyzes various research results of various segmentation methods since the emergence of U-Net and conducts a comprehensive review of related papers. First, this paper has summarized the improved methods of U-Net and then listed the existing significance of image segmentation techniques and their improvements that has introduced over the years. Finally, focusing on the different improvement strategies of U-Net in different papers, the related work of each application target is reviewed according to detailed technical categories to facilitate future research. Researchers can clearly see the dynamics of transmission of technological development and keep up with future trends in this interdisciplinary field.

preprint2022arXiv

AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI.

preprint2022arXiv

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation

Computer-aided medical image segmentation has been applied widely in diagnosis and treatment to obtain clinically useful information of shapes and volumes of target organs and tissues. In the past several years, convolutional neural network (CNN) based methods (e.g., U-Net) have dominated this area, but still suffered from inadequate long-range information capturing. Hence, recent work presented computer vision Transformer variants for medical image segmentation tasks and obtained promising performances. Such Transformers model long-range dependency by computing pair-wise patch relations. However, they incur prohibitive computational costs, especially on 3D medical images (e.g., CT and MRI). In this paper, we propose a new method called Dilated Transformer, which conducts self-attention for pair-wise patch relations captured alternately in local and global scopes. Inspired by dilated convolution kernels, we conduct the global self-attention in a dilated manner, enlarging receptive fields without increasing the patches involved and thus reducing computational costs. Based on this design of Dilated Transformer, we construct a U-shaped encoder-decoder hierarchical architecture called D-Former for 3D medical image segmentation. Experiments on the Synapse and ACDC datasets show that our D-Former model, trained from scratch, outperforms various competitive CNN-based or Transformer-based segmentation models at a low computational cost without time-consuming per-training process.

preprint2022arXiv

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress the learned AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.

preprint2022arXiv

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL versus single-head multilingual SSL and supervised pre-training versus SSL. The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre-training strategies.

preprint2022arXiv

DialMed: A Dataset for Dialogue-based Medication Recommendation

Medication recommendation is a crucial task for intelligent healthcare systems. Previous studies mainly recommend medications with electronic health records (EHRs). However, some details of interactions between doctors and patients may be ignored or omitted in EHRs, which are essential for automatic medication recommendation. Therefore, we make the first attempt to recommend medications with the conversations between doctors and patients. In this work, we construct DIALMED, the first high-quality dataset for medical dialogue-based medication recommendation task. It contains 11,996 medical dialogues related to 16 common diseases from 3 departments and 70 corresponding common medications. Furthermore, we propose a Dialogue structure and Disease knowledge aware Network (DDN), where a QA Dialogue Graph mechanism is designed to model the dialogue structure and the knowledge graph is used to introduce external disease knowledge. The extensive experimental results demonstrate that the proposed method is a promising solution to recommend medications with medical dialogues. The dataset and code are available at https://github.com/f-window/DialMed.

preprint2022arXiv

Direct numerical simulations of incompressible multiphase electrohydrodynamic flow with single-phase transportation schemes

In the present study, two schemes named face discernment and flux correction are proposed to achieve single-phase transportation of free charge in multiphase electrohydrodynamic(EHD) problems. Many EHD phenomena occur between air and another liquid while the free charge can only be transported in the liquid phase through ohmic conduction and convection due to the poor conductivity of air. However, the charge may be leaked into the dielectric air during the simulation due to the asynchronous transportation between interface and free charge. To avoid this unphysical error, a face discernment method is designed to produce an accurate ohmic conduction of free charge by providing a superior physical properties distribution at the interface. Subsequently the flux correction method is developed to correct the advection flux of charge density to prevent ions crossing the interface. These two schemes are based on the Volume of Fluid (VOF) model and independent with the specific interface updating method. The performance of the proposed methods are carefully validated with several test cases. The algorithms are implemented as an OpenFOAM extension and are published as open source.

preprint2022arXiv

Electrocardio Panorama: Synthesizing New ECG Views with Self-supervision

Multi-lead electrocardiogram (ECG) provides clinical information of heartbeats from several fixed viewpoints determined by the lead positioning. However, it is often not satisfactory to visualize ECG signals in these fixed and limited views, as some clinically useful information is represented only from a few specific ECG viewpoints. For the first time, we propose a new concept, Electrocardio Panorama, which allows visualizing ECG signals from any queried viewpoints. To build Electrocardio Panorama, we assume that an underlying electrocardio field exists, representing locations, magnitudes, and directions of ECG signals. We present a Neural electrocardio field Network (Nef-Net), which first predicts the electrocardio field representation by using a sparse set of one or few input ECG views and then synthesizes Electrocardio Panorama based on the predicted representations. Specially, to better disentangle electrocardio field information from viewpoint biases, a new Angular Encoding is proposed to process viewpoint angles. Also, we propose a self-supervised learning approach called Standin Learning, which helps model the electrocardio field without direct supervision. Further, with very few modifications, Nef-Net can also synthesize ECG signals from scratch. Experiments verify that our Nef-Net performs well on Electrocardio Panorama synthesis, and outperforms the previous work on the auxiliary tasks (ECG view transformation and ECG synthesis from scratch). The codes and the division labels of cardiac cycles and ECG deflections on Tianchi ECG and PTB datasets are available at https://github.com/WhatAShot/Electrocardio-Panorama.

preprint2022arXiv

Explanation of nearby SNRs for primary electron excess and proton spectral bump

Several groups have reported a possible excess of primary electrons at high energies with the joint fit of the positron fraction and total electron/positron spectra. With the latest release of high-precision electron/positron spectra measured by AMS-02, we further confirm this excess by fitting $ΔΦ$ $\rm(i.e., Φ_{e^-}-Φ_{e^+})$ data in this work. Then we investigate the contribution of a single nearby supernova remnant to the primary electron excess and find that Monogem can reasonably account for this excess. Moreover, we predict that the electron spectrum may harden again at a few TeVs due to Vela&#39;s contribution. DAMPE, which can accurately measure electrons at TeV scale, is expected to provide the robust test of this new spectral feature in the near future. Finally, we fit the proton spectrum data of DAMPE with Monogem or Loop I. We find that both the primary electron excess and the proton spectral bump could be mainly generated by Monogem.

preprint2022arXiv

Femtosecond rotational dynamics of D$_2$ molecules in superfluid helium nanodroplets

Rotational dynamics of D$_2$ molecules inside helium nanodroplets is induced by a moderately intense femtosecond (fs) pump pulse and measured as a function of time by recording the yield of HeD$^+$ ions, created through strong-field dissociative ionization with a delayed fs probe pulse. The yield oscillates with a period of 185 fs, reflecting field-free rotational wave packet dynamics, and the oscillation persists for more than 500 periods. Within the experimental uncertainty, the rotational constant BHe of the in-droplet D$_2$ molecule, determined by Fourier analysis, is the same as Bgas for an isolated D$_2$ molecule. Our observations show that the D$_2$ molecules inside helium nanodroplets essentially rotate as free D$_2$ molecules.

preprint2022arXiv

Identifying Electrocardiogram Abnormalities Using a Handcrafted-Rule-Enhanced Neural Network

A large number of people suffer from life-threatening cardiac abnormalities, and electrocardiogram (ECG) analysis is beneficial to determining whether an individual is at risk of such abnormalities. Automatic ECG classification methods, especially the deep learning based ones, have been proposed to detect cardiac abnormalities using ECG records, showing good potential to improve clinical diagnosis and help early prevention of cardiovascular diseases. However, the predictions of the known neural networks still do not satisfactorily meet the needs of clinicians, and this phenomenon suggests that some information used in clinical diagnosis may not be well captured and utilized by these methods. In this paper, we introduce some rules into convolutional neural networks, which help present clinical knowledge to deep learning based ECG analysis, in order to improve automated ECG diagnosis performance. Specifically, we propose a Handcrafted-Rule-enhanced Neural Network (called HRNN) for ECG classification with standard 12-lead ECG input, which consists of a rule inference module and a deep learning module. Experiments on two large-scale public ECG datasets show that our new approach considerably outperforms existing state-of-the-art methods. Further, our proposed approach not only can improve the diagnosis performance, but also can assist in detecting mislabelled ECG samples. Our codes are available at https://github.com/alwaysbyx/ecg_processing.

preprint2022arXiv

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

The Large-scale 3D reconstruction, texturing and semantic mapping are nowadays widely used for automated driving vehicles, virtual reality and automatic data generation. However, most approaches are developed for RGB-D cameras with colored dense point clouds and not suitable for large-scale outdoor environments using sparse LiDAR point clouds. Since a 3D surface can be usually observed from multiple camera images with different view poses, an optimal image patch selection for the texturing and an optimal semantic class estimation for the semantic mapping are still challenging. To address these problems, we propose a novel 3D reconstruction, texturing and semantic mapping system using LiDAR and camera sensors. An Adaptive Truncated Signed Distance Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities and improve model quality. The from this implicit function extracted triangle mesh map is then textured from a series of registered camera images by applying an optimal image patch selection strategy. Besides that, a Markov Random Field-based data fusion approach is proposed to estimate the optimal semantic class for each triangle mesh. Our approach is evaluated on a synthetic dataset, the KITTI dataset and a dataset recorded with our experimental vehicle. The results show that the 3D models generated using our approach are more accurate in comparison to using other state-of-the-art approaches. The texturing and semantic mapping achieve also very promising results.

preprint2022arXiv

Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study

An Xception model reaches state-of-the-art (SOTA) accuracy on the ESC-50 dataset for audio event detection through knowledge transfer from ImageNet weights, pretraining on AudioSet, and an on-the-fly data augmentation pipeline. This paper presents an ablation study that analyzes which components contribute to the boost in performance and training time. A smaller Xception model is also presented which nears SOTA performance with almost a third of the parameters.

preprint2022arXiv

Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages

It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic. In this study, we present a new way to group multiple low-resource locales together and optimize the performance of Multilingual Transformer LMs in ASR. Our Locale-group Multilingual Transformer LMs outperform traditional multilingual LMs along with reducing maintenance costs and operating expenses. Further, for low-resource but high-traffic locales where deploying monolingual models is feasible, we show that fine-tuning our locale-group multilingual LMs produces better monolingual LM candidates than baseline monolingual LMs.

preprint2022arXiv

Online Deep Learning from Doubly-Streaming Data

This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.

preprint2022arXiv

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation

Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.

preprint2022arXiv

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi-talker ASR models using multiple output branches, the t-SOT model has only a single output branch that generates recognition tokens (e.g., words, subwords) of multiple speakers in chronological order based on their emission times. A special token that indicates the change of ``virtual&#39;&#39; output channels is introduced to keep track of the overlapping utterances. Compared to the prior streaming multi-talker ASR models, the t-SOT model has the advantages of less inference cost and a simpler model architecture. Moreover, in our experiments with LibriSpeechMix and LibriCSS datasets, the t-SOT-based transformer transducer model achieves the state-of-the-art word error rates by a significant margin to the prior results. For non-overlapping speech, the t-SOT model is on par with a single-talker ASR model in terms of both accuracy and computational cost, opening the door for deploying one model for both single- and multi-talker scenarios.

preprint2022arXiv

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize ``who spoke what&#39;&#39; with low latency even when multiple people are speaking simultaneously. Our model is based on token-level serialized output training (t-SOT) which was recently proposed to transcribe multi-talker speech in a streaming fashion. To further recognize speaker identities, we propose an encoder-decoder based speaker embedding extractor that can estimate a speaker representation for each recognized token not only from non-overlapping speech but also from overlapping speech. The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency. We evaluate the proposed model for a joint task of ASR and SID/SD by using LibriSpeechMix and LibriCSS corpora. The proposed model achieves substantially better accuracy than a prior streaming model and shows comparable or sometimes even superior results to the state-of-the-art offline SA-ASR model.

preprint2022arXiv

The effect of $f$-$c$ hybridization on the $γ\rightarrowα$ phase transition of cerium studied by lanthanum doping

The hybridization between the localized 4$f$ level ($f$) with conduction ($c$) states in $γ$-Ce upon cooling has been previously revealed in single crystalline thin films experimentally and theoretically, whereas its influence on the $γ\rightarrowα$ phase transition was not explicitly verified, due to the fact that the phase transition happened in the bulk-layer, leaving the surface in the $γ$ phase. Here in our work, we circumvent this issue by investigating the effect of alloying addition of La on Ce, by means of crystal structure, electronic transport and ARPES measurements, together with a phenomenological periodic Anderson model and a modified Anderson impurity model. Our current researches indicate that the weakening of $f$-$c$ hybridization is the major factor in the suppression of $γ\rightarrowα$ phase transition by La doping. The consistency of our results with the effects of other rare earth and actinide alloying additions on the $γ\rightarrowα$ phase transition of Ce is also discussed. Our work demonstrates the importance of the interaction of $f$ and $c$ electrons in understanding the unconventional phase transition in Ce, which is intuitive for further researches on other rare earth and actinide metals and alloys with similar phase transition behaviors.

preprint2022arXiv

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which hinders its deployment on edge devices. A small Transformer model with fewer encoder layers is preferred for computational efficiency, but it is prone to performance degradation. In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning). We introduce layer-wise T-S learning and objective shifting mechanisms to guide the small student model to learn intermediate representations from the large teacher model. Compared with the small Transformer model trained from scratch, the proposed T-S learning method reduces the word error rate (WER) by more than 5% for both multi-channel and single-channel speech separation on LibriCSS dataset. Utilizing more unlabeled speech data, our ultra fast speech separation models achieve more than 10% relative WER reduction.

preprint2022arXiv

What Can Machine Vision Do for Lymphatic Histopathology Image Analysis: A Comprehensive Review

In the past ten years, the computing power of machine vision (MV) has been continuously improved, and image analysis algorithms have developed rapidly. At the same time, histopathological slices can be stored as digital images. Therefore, MV algorithms can provide doctors with diagnostic references. In particular, the continuous improvement of deep learning algorithms has further improved the accuracy of MV in disease detection and diagnosis. This paper reviews the applications of image processing technology based on MV in lymphoma histopathological images in recent years, including segmentation, classification and detection. Finally, the current methods are analyzed, some more potential methods are proposed, and further prospects are made.

preprint2022arXiv

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

preprint2022arXiv

Zeptosecond Angular Streak Camera

Time-resolved electronic processes on the attosecond scale have recently become experimentally accessible through the development of laser-based pump-probe interrogation techniques such as the attosecond streak camera, the reconstruction of attosecond beating by interference of two-photon transitions, and the attoclock. In this work, we demonstrate that by combining the concepts of the attosecond streak camera and the attoclock, time resolved processes down to the time scale of tens of zeptoseconds come into reach. Key to advancing to this remarkable level of time precision by this method termed the zeptosecond angular streak camera (ZASC) is its substantial intrinsic time-information redundancy. The ZASC results in a remarkably simple streaking trace, which is largely independent of the precise temporal structure of the streaking pulse, thereby bypassing the need for detailed characterization of the streaking field. Moreover, it is capable of retrieving information on the duration of the pump pulse. It is also capable of reaching attosecond-level precision in a single-shot mode that may be useful for free-electron-laser experiments. This concept promises to open pathways towards the chronoscopy of zeptosecond-level ultrafast processes.

preprint2021arXiv

A Comprehensive Review of Computer-aided Whole-slide Image Analysis: from Datasets to Feature Extraction, Segmentation, Classification, and Detection Approaches

With the development of computer-aided diagnosis (CAD) and image scanning technology, Whole-slide Image (WSI) scanners are widely used in the field of pathological diagnosis. Therefore, WSI analysis has become the key to modern digital pathology. Since 2004, WSI has been used more and more in CAD. Since machine vision methods are usually based on semi-automatic or fully automatic computers, they are highly efficient and labor-saving. The combination of WSI and CAD technologies for segmentation, classification, and detection helps histopathologists obtain more stable and quantitative analysis results, save labor costs and improve diagnosis objectivity. This paper reviews the methods of WSI analysis based on machine learning. Firstly, the development status of WSI and CAD methods are introduced. Secondly, we discuss publicly available WSI datasets and evaluation metrics for segmentation, classification, and detection tasks. Then, the latest development of machine learning in WSI segmentation, classification, and detection are reviewed continuously. Finally, the existing methods are studied, the applicabilities of the analysis methods are analyzed, and the application prospects of the analysis methods in this field are forecasted.

preprint2021arXiv

A Synthetic Prediction Market for Estimating Confidence in Published Work

Explainably estimating confidence in published scholarly work offers opportunity for faster and more robust scientific progress. We develop a synthetic prediction market to assess the credibility of published claims in the social and behavioral sciences literature. We demonstrate our system and detail our findings using a collection of known replication projects. We suggest that this work lays the foundation for a research agenda that creatively uses AI for peer review.

preprint2021arXiv

Document Domain Randomization for Deep Learning Document Layout Extraction

We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages to real-world document segmentation. DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest, with user-defined layout and font styles to support joint learning of fine-grained classes. We demonstrate competitive results using our DDR approach to extract nine document classes from the benchmark CS-150 and papers published in two domains, namely annual meetings of Association for Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to conditions of style mismatch, fewer or more noisy samples that are more easily obtained in the real world. We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy. Using smaller training samples had a slightly detrimental effect. Finally, network models still achieved high test accuracy when correct labels are diluted towards confusing labels; this behavior hold across several classes.

preprint2021arXiv

Femtosecond dynamics of a polariton bosonic cascade at room temperature

Whispering gallery modes in a microwire are characterized by a nearly equidistant energy spectrum. In the strong exciton-photon coupling regime, this system represents a bosonic cascade: a ladder of discrete energy levels that sustains stimulated transitions between neighboring steps. In this work, by using femtosecond angle-resolved spectroscopic imaging technique, the ultrafast dynamics of polaritons in a bosonic cascade based on a one-dimensional ZnO whispering gallery microcavity is explicitly visualized. Clear ladder-form build-up process from higher to lower energy branches of the polariton condensates are observed, which are well reproduced by modeling using rate equations. Moreover, the polariton parametric scattering dynamics are distinguished on a timescale of hundreds of femtoseconds. Our understanding of the femtosecond condensation and scattering dynamics paves the way towards ultrafast coherent control of polaritons at room temperature, which will make it promising for high-speed all-optical integrated applications.

preprint2021arXiv

Flow-Mixup: Classifying Multi-labeled Medical Images with Corrupted Labels

In clinical practice, medical image interpretation often involves multi-labeled classification, since the affected parts of a patient tend to present multiple symptoms or comorbidities. Recently, deep learning based frameworks have attained expert-level performance on medical image interpretation, which can be attributed partially to large amounts of accurate annotations. However, manually annotating massive amounts of medical images is impractical, while automatic annotation is fast but imprecise (possibly introducing corrupted labels). In this work, we propose a new regularization approach, called Flow-Mixup, for multi-labeled medical image classification with corrupted labels. Flow-Mixup guides the models to capture robust features for each abnormality, thus helping handle corrupted labels effectively and making it possible to apply automatic annotation. Specifically, Flow-Mixup decouples the extracted features by adding constraints to the hidden states of the models. Also, Flow-Mixup is more stable and effective comparing to other known regularization methods, as shown by theoretical and empirical analyses. Experiments on two electrocardiogram datasets and a chest X-ray dataset containing corrupted labels verify that Flow-Mixup is effective and insensitive to corrupted labels.

preprint2021arXiv

Numerical analysis of electrohydrodynamic (EHD) instability in dielectric liquid-gas flows subjected to unipolar injection

In this work, the electrohydrodynamic (EHD) instability induced by a unipolar charge injection is extended from a single-phase dielectric liquid to a two-phase system that consists of a liquid-air interface. A volume of fluid (VOF) model based two-phase solver was developed with simplified Maxwell equations implemented in the open-source platform OpenFOAM\textsuperscript. The numerically obtained critical value for the linear stability matches well with the theoretical values. To highlight the effect of the slip boundary at interface, the deformation of the interface is ignored. A bifurcation diagram with hysteresis loop linking the linear and finite amplitude criteria, which is Uf = 0.059, was obtained in this situation. It is concluded that the lack of viscous effect at interface leads to a significant increase in the flow intensity, which is the reason for the smaller instability threshold in two-phase system. The presence of interface also changes the flow structure and makes the flow vortices shift closer to the interface.

preprint2021arXiv

Speaker attribution with voice profiles by graph-based semi-supervised learning

Speaker attribution is required in many real-world applications, such as meeting transcription, where speaker identity is assigned to each utterance according to speaker voice profiles. In this paper, we propose to solve the speaker attribution problem by using graph-based semi-supervised learning methods. A graph of speech segments is built for each session, on which segments from voice profiles are represented by labeled nodes while segments from test utterances are unlabeled nodes. The weight of edges between nodes is evaluated by the similarities between the pretrained speaker embeddings of speech segments. Speaker attribution then becomes a semi-supervised learning problem on graphs, on which two graph-based methods are applied: label propagation (LP) and graph neural networks (GNNs). The proposed approaches are able to utilize the structural information of the graph to improve speaker attribution performance. Experimental results on real meeting data show that the graph based approaches reduce speaker attribution error by up to 68% compared to a baseline speaker identification approach that processes each utterance independently.

preprint2020arXiv

An End-to-end Architecture of Online Multi-channel Speech Separation

Multi-speaker speech recognition has been one of the keychallenges in conversation transcription as it breaks the singleactive speaker assumption employed by most state-of-the-artspeech recognition systems. Speech separation is consideredas a remedy to this problem. Previously, we introduced a sys-tem, calledunmixing,fixed-beamformerandextraction(UFE),that was shown to be effective in addressing the speech over-lap problem in conversation transcription. With UFE, an inputmixed signal is processed by fixed beamformers, followed by aneural network post filtering. Although promising results wereobtained, the system contains multiple individually developedmodules, leading potentially sub-optimum performance. In thiswork, we introduce an end-to-end modeling version of UFE. Toenable gradient propagation all the way, an attentional selectionmodule is proposed, where an attentional weight is learnt foreach beamformer and spatial feature sampled over space. Ex-perimental results show that the proposed system achieves com-parable performance in an offline evaluation with the originalseparate processing-based pipeline, while producing remark-able improvements in an online evaluation.

preprint2020arXiv

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89\% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.

preprint2020arXiv

Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully} overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speaker-independent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction.

preprint2020arXiv

Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks

Subject categories of scholarly papers generally refer to the knowledge domain(s) to which the papers belong, examples being computer science or physics. Subject category information can be used for building faceted search for digital library search engines. This can significantly assist users in narrowing down their search space of relevant documents. Unfortunately, many academic papers do not have such information as part of their metadata. Existing methods for solving this task usually focus on unsupervised learning that often relies on citation networks. However, a complete list of papers citing the current paper may not be readily available. In particular, new papers that have few or no citations cannot be classified using such methods. Here, we propose a deep attentive neural network (DANN) that classifies scholarly papers using only their abstracts. The network is trained using 9 million abstracts from Web of Science (WoS). We also use the WoS schema that covers 104 subject categories. The proposed network consists of two bi-directional recurrent neural networks followed by an attention layer. We compare our model against baselines by varying the architecture and text representation. Our best model achieves micro-F1 measure of 0.76 with F1 of individual subject categories ranging from 0.50-0.95. The results showed the importance of retraining word embedding models to maximize the vocabulary overlap and the effectiveness of the attention mechanism. The combination of word vectors with TFIDF outperforms character and sentence level embedding models. We discuss imbalanced samples and overlapping categories and suggest possible strategies for mitigation. We also determine the subject category distribution in CiteSeerX by classifying a random sample of one million academic papers.

preprint2020arXiv

Limitation of Finite Difference Scheme in Electroconvection with Unipolar Charge Injection: A base-state Analysis

The 1D hydrostatic base state of electroconvection driven by unipolar charge injection between two parallel electrodes is investigated using a finite difference method. A boundary layer near the anode surface is derived analytically. The computational grid is required to resolve this boundary layer to maintain high order accuracy.

preprint2020arXiv

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve the representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end signal processing, including WPE and beamforming, has no obvious contribution, but together with data selection and domain adversarial training, can further contribute to 0.5% EER reduction. Finally, data augmentation, which works with a specifically-designed data selection strategy, can lead to 2% EER reduction. Together with the above contributions, in the middle challenge results, our single submission system (without multi-system fusion) achieves the first and second place on task 1 and task 3, respectively.

preprint2020arXiv

Privileged Features Distillation at Taobao Recommendations

Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving. We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student to improve its accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.

preprint2020arXiv

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the ground-truth adjacency matrix. Spectral clustering is then applied on top of the refined embeddings. We show that the clustering performance of the refined speaker embeddings outperforms the original embeddings significantly on both simulated and real meeting data, and our system achieves the state-of-the-art result on the NIST SRE 2000 CALLHOME database.

preprint2020arXiv

Theory of Subcycle Linear Momentum Transfer in Strong-Field Tunneling Ionization

Interaction of a strong laser pulse with matter transfers not only energy but also linear momentum of the photons. Recent experimental advances have made it possible to detect the small amount of linear momentum delivered to the photoelectrons in strong-field ionization of atoms. We present numerical simulations as well as an analytical description of the subcycle phase (or time) resolved momentum transfer to an atom accessible by an attoclock protocol. We show that the light-field-induced momentum transfer is remarkably sensitive to properties of the ultrashort laser pulse such as its carrier-envelope phase and ellipticity. Moreover, we show that the subcycle resolved linear momentum transfer can provide novel insights into the interplay between nonadiabatic and nondipole effects in strong-field ionization. This work paves the way towards the investigation of the so-far unexplored time-resolved nondipole nonadiabatic tunneling dynamics.

preprint2020arXiv

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version

preprint2019arXiv

Echo in a Single Molecule

Echo is a ubiquitous phenomenon found in many physical systems, ranging from spins in magnetic fields to particle beams in hadron accelerators. It is typically observed in inhomogeneously broadened ensembles of nonlinear objects, and is used to eliminate the effects of environmental-induced dephasing, enabling observation of proper, inherent object properties. Here, we report experimental observation of quantum wave packet echoes in a single isolated molecule. In contrast to conventional echoes, here the entire dephasing-rephasing cycle occurs within a single molecule without any inhomogeneous spread of molecular properties, or any interaction with the environment. In our experiments, we use a short laser pulse to impulsively excite a vibrational wave packet in an anharmonic molecular potential, and observe its oscillations and eventual dispersion with time. A second delayed pulsed excitation is applied, giving rise to an echo: a partial recovery of the initial coherent wavepacket. The vibrational dynamics of single molecules is visualized by time-delayed probe pulse dissociating them one at a time. Two mechanisms for the echo formation are discussed: ac Stark-induced molecular potential shaking and creation of depletion-induced &#34;hole&#34; in the nuclear spatial distribution. Interplay between the optically induced echoes and quantum revivals of the vibrational wave packets is observed and theoretically analyzed. The single molecule wave packet echoes may lead to the development of new tools for probing ultrafast intramolecular processes in various molecules.

preprint2019arXiv

Measurement of Beam-Correlated Background Neutrons from the Fermilab Booster Neutrino Beam in ANNIE Phase-I

The Accelerator Neutrino Neutron Interaction Experiment (ANNIE) aims to make a unique measurement of neutron yield from neutrino-nucleus interactions and to perform R&D for the next generation of water-based neutrino detectors. In this paper, we characterize beam-induced neutron backgrounds in the experimental hall at Fermi National Accelerator Laboratory. It is shown that the background levels are sufficiently low to allow the next stage of the experiment to proceed. These measurements are relevant to other Booster Neutrino Beam (BNB) experiments located adjacent to ANNIE Hall, where dirt neutrons and sky-shine could present similar backgrounds.

preprint2019arXiv

Searching for the possible signal of the photon-axionlike particle oscillation in the combined GeV and TeV spectra of supernova remnants

The conversion between photons and axionlike particles (ALPs) in the Milky Way magnetic field could result in the detectable oscillation phenomena in $γ$-ray spectra of Galactic sources. In this work, the GeV (Fermi-LAT) and TeV (MAGIC/VERITAS/H.E.S.S.) data of three bright supernova remnants (SNRs, ie. IC443, W51C and W49B) have been adopted together to search such the oscillation effect. Different from our previous analysis of the sole Fermi-LAT data of IC443, we do not find any reliable signal for the photon-ALP oscillation in the joint broadband spectrum of each SNR. The reason for the inconsistence is that in this work we use the latest revision (P8R3) of Fermi-LAT data, updated diffuse emission templates and the new version of the source catalog (4FGL), which lead to some modification of the GeV spectrum of IC443. Then we set constraints on ALP parameters based on the combined analysis of all the three sources. Though these constraints are somewhat weaker than limits from the CAST experiment and globular clusters, they are supportive of and complementary to these other results.

preprint2017arXiv

Large deformation and instability of soft hollow cylinder with surface effects

Surface stress, which is always neglected in classical elastic theories, has recently emerged as a key role in the mechanics of highly deformable soft solids. In this paper, the effect of surface stress on the deformation and instability of soft hollow cylinder are analyzed. By incorporating surface energy density function into the constitutive model of a hyper-elastic theory, explicit solutions are obtained for the deformation of soft hollow cylinder under the conditions of uniform pressure loading and geometric everting. It is found that surface tension evidently alters the deformation of the soft cylinder. Specifically, the surface stiffness resists the deformation, but the residual surface stress is inclined to larger deformation. Effects of surface stress on the instability of the soft hollow cylinder is also explored. For both the pressure loading and geometric everting conditions, significant changes in critical condition of the creases are found by varying the surface parameter. The results in this work reveal that surface energy obviously influences both the deformation and the instability of soft hollow cylinder at finite deformation. The obtained results will be helpful for understanding and predicting the mechanical behavior of soft structures accurately.