Source author record

Xu Sun

Xu Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

37works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models

Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propose a unified framework spanning benchmark design, data construction, and reward model training. We introduce Video Understanding Reward Bench (VURB), a benchmark featuring 2,100 preference pairs with long chain-of-thought reasoning traces (averaging 1,143 tokens) and majority voting evaluation across general, long, and reasoning-oriented video tasks. We further construct Video Understanding Preference Dataset (VUP-35K) via a fully automated pipeline, providing large-scale high-quality supervision for video reward training. Building on the data, we train VideoDRM and VideoGRM, a discriminative and a generative reward model, both achieving state-of-the-art performance on VURB and VideoRewardBench. Further analysis confirms that VUP-35K enhances both reward performance and model reasoning capability, while VideoDRM and VideoGRM yield significant gains under best-of-$N$ test-time scaling.

preprint2022arXiv

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently developed for automatically detecting AMD from fundus images. However, there are still lack of a comprehensive annotated dataset and standard evaluation benchmarks. To deal with this issue, we set up the Automatic Detection challenge on Age-related Macular degeneration (ADAM), which was held as a satellite event of the ISBI 2020 conference. The ADAM challenge consisted of four tasks which cover the main aspects of detecting and characterizing AMD from fundus images, including detection of AMD, detection and segmentation of optic disc, localization of fovea, and detection and segmentation of lesions. As part of the challenge, we have released a comprehensive dataset of 1200 fundus images with AMD diagnostic labels, pixel-wise segmentation masks for both optic disc and AMD-related lesions (drusen, exudates, hemorrhages and scars, among others), as well as the coordinates corresponding to the location of the macular fovea. A uniform evaluation framework has been built to make a fair comparison of different models using this dataset. During the challenge, 610 results were submitted for online evaluation, with 11 teams finally participating in the onsite challenge. This paper introduces the challenge, the dataset and the evaluation methods, as well as summarizes the participating methods and analyzes their results for each task. In particular, we observed that the ensembling strategy and the incorporation of clinical domain knowledge were the key to improve the performance of the deep learning models.

preprint2022arXiv

Adversarial Parameter Defense by Multi-Step Risk Minimization

Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.

preprint2022arXiv

DFTR: Depth-supervised Fusion Transformer for Salient Object Detection

Automated salient object detection (SOD) plays an increasingly crucial role in many computer vision applications. By reformulating the depth information as supervision rather than as input, depth-supervised convolutional neural networks (CNN) have achieved promising results on both RGB and RGB-D SOD scenarios with the merits of no requirements for extra depth networks and depth inputs in the inference stage. This paper, for the first time, seeks to expand the applicability of depth supervision to the Transformer architecture. Specifically, we develop a Depth-supervised Fusion TRansformer (DFTR), to further improve the accuracy of both RGB and RGB-D SOD. The proposed DFTR involves three primary features: 1) DFTR, to the best of our knowledge, is the first pure Transformer-based model for depth-supervised SOD; 2) A multi-scale feature aggregation (MFA) module is proposed to fully exploit the multi-scale features encoded by the Swin Transformer in a coarse-to-fine manner; 3) To enable bidirectional information flow across different streams of features, a novel multi-stage feature fusion (MFF) module is further integrated into our DFTR with the emphasis on salient regions at different network learning stages. We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks. The code and model will be made publicly available.

preprint2022arXiv

Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction

Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by $5\times$ while maintaining $99.6\%$ prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios.

preprint2022arXiv

Hierarchical Inductive Transfer for Continual Dialogue Learning

Pre-trained models have achieved excellent performance on the dialogue task. However, for the continual increase of online chit-chat scenarios, directly fine-tuning these models for each of the new tasks not only explodes the capacity of the dialogue system on the embedded devices but also causes knowledge forgetting on pre-trained models and knowledge interference among diverse dialogue tasks. In this work, we propose a hierarchical inductive transfer framework to learn and deploy the dialogue skills continually and efficiently. First, we introduce the adapter module into pre-trained models for learning new dialogue tasks. As the only trainable module, it is beneficial for the dialogue system on the embedded devices to acquire new dialogue skills with negligible additional parameters. Then, for alleviating knowledge interference between tasks yet benefiting the regularization between them, we further design hierarchical inductive transfer that enables new tasks to use general knowledge in the base adapter without being misled by diverse knowledge in task-specific adapters. Empirical evaluation and analysis indicate that our framework obtains comparable performance under deployment-friendly model capacity.

preprint2022arXiv

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work finds that backdoors can be injected into a trained clean model with Adversarial Weight Perturbation (AWP). Here AWPs refers to the variations of parameters that are small in backdoor learning. In this work, we observe an interesting phenomenon that the variations of parameters are always AWPs when tuning the trained clean model to inject backdoors. We further provide theoretical analysis to explain this phenomenon. We formulate the behavior of maintaining accuracy on clean data as the consistency of backdoored models, which includes both global consistency and instance-wise consistency. We extensively analyze the effects of AWPs on the consistency of backdoored models. In order to achieve better consistency, we propose a novel anchoring loss to anchor or freeze the model behaviors on the clean data, with a theoretical guarantee. Both the analytical and the empirical results validate the effectiveness of the anchoring loss in improving the consistency, especially the instance-wise consistency.

preprint2022arXiv

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. Different from image captioning that describes a static image with details of almost every object, video captioning usually considers a sequence of frames and biases towards focused objects, e.g., the objects that stay in focus regardless of the changing background. Therefore, detecting and properly accommodating focused objects is critical in video captioning. To enforce the description of focused objects and achieve controllable video captioning, we propose an Object-Oriented Non-Autoregressive approach (O2NA), which performs caption generation in three steps: 1) identify the focused objects and predict their locations in the target caption; 2) generate the related attribute words and relation words of these focused objects to form a draft caption; and 3) combine video information to refine the draft caption to a fluent final caption. Since the focused objects are generated and located ahead of other words, it is difficult to apply the word-by-word autoregressive generation process; instead, we adopt a non-autoregressive approach. The experiments on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate the effectiveness of O2NA, which achieves results competitive with the state-of-the-arts but with both higher diversity and higher inference speed.

preprint2022arXiv

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing. In this paper, we build a new toolkit, named PKUSEG, for multi-domain word segmentation. Unlike existing single-model toolkits, PKUSEG targets multi-domain word segmentation and provides separate models for different domains, such as web, medicine, and tourism. Besides, due to the lack of labeled data in many domains, we propose a domain adaptation paradigm to introduce cross-domain semantic knowledge via a translation system. Through this method, we generate synthetic data using a large amount of unlabeled data in the target domain and then obtain a word segmentation model for the target domain. We also further refine the performance of the default model with the help of synthetic data. Experiments show that PKUSEG achieves high performance on multiple domains. The new toolkit also supports POS tagging and model training to adapt to various application scenarios. The toolkit is now freely and publicly available for the usage of research and industry.

preprint2022arXiv

REFUGE2 Challenge: A Treasure Trove for Multi-Dimension Analysis and Evaluation in Glaucoma Screening

With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application.

preprint2022arXiv

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.

preprint2022arXiv

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains under-explored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.

preprint2021arXiv

A representation formula for the probability density in stochastic dynamical systems with memory

Marcus stochastic delay differential equations (SDDEs) are often used to model stochastic dynamical systems with memory in science and engineering. Since no infinitesimal generators exist for Marcus SDDEs due to the non-Markovian property, conventional Fokker-Planck equations, which govern the evolution behavior of density, are not available for Marcus SDDEs. In this paper, we identify the Marcus SDDE with some Marcus stochastic differential equation (SDE) without delays but subject to extra constraints. This provides an efficient way to establish existence and uniqueness for the solution, and obtain a representation formula for probability density of the Marcus SDDE. In the formula, the probability density for Marcus SDDE is expressed in terms of that for Marcus SDE without delay.

preprint2021arXiv

Collaborative Group Learning

Collaborative learning has successfully applied knowledge transfer to guide a pool of small student networks towards robust local minima. However, previous approaches typically struggle with drastically aggravated student homogenization when the number of students rises. In this paper, we propose Collaborative Group Learning, an efficient framework that aims to diversify the feature representation and conduct an effective regularization. Intuitively, similar to the human group study mechanism, we induce students to learn and exchange different parts of course knowledge as collaborative groups. First, each student is established by randomly routing on a modular neural network, which facilitates flexible knowledge communication between students due to random levels of representation sharing and branching. Second, to resist the student homogenization, students first compose diverse feature sets by exploiting the inductive bias from sub-sets of training data, and then aggregate and distill different complementary knowledge by imitating a random sub-group of students at each time step. Overall, the above mechanisms are beneficial for maximizing the student population to further improve the model generalization without sacrificing computational efficiency. Empirical evaluations on both image and text tasks indicate that our method significantly outperforms various state-of-the-art collaborative approaches whilst enhancing computational efficiency.

preprint2021arXiv

Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Neural dialogue models suffer from low-quality responses when interacted in practice, demonstrating difficulty in generalization beyond training data. Recently, knowledge distillation has been used to successfully regularize the student by transferring knowledge from the teacher. However, the teacher and the student are trained on the same dataset and tend to learn similar feature representations, whereas the most general knowledge should be found through differences. The finding of general knowledge is further hindered by the unidirectional distillation, as the student should obey the teacher and may discard some knowledge that is truly general but refuted by the teacher. To this end, we propose a novel training framework, where the learning of general knowledge is more in line with the idea of reaching consensus, i.e., finding common knowledge that is beneficial to different yet all datasets through diversified learning partners. Concretely, the training task is divided into a group of subtasks with the same number of students. Each student assigned to one subtask not only is optimized on the allocated subtask but also imitates multi-view feature representation aggregated from other students (i.e., student peers), which induces students to capture common knowledge among different subtasks and alleviates the over-fitting of students on the allocated subtasks. To further enhance generalization, we extend the unidirectional distillation to the bidirectional distillation that encourages the student and its student peers to co-evolve by exchanging complementary knowledge with each other. Empirical results and analysis demonstrate that our training framework effectively improves the model generalization without sacrificing training efficiency.

preprint2020arXiv

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography

Angle closure glaucoma (ACG) is a more aggressive disease than open-angle glaucoma, where the abnormal anatomical structures of the anterior chamber angle (ACA) may cause an elevated intraocular pressure and gradually lead to glaucomatous optic neuropathy and eventually to visual impairment and blindness. Anterior Segment Optical Coherence Tomography (AS-OCT) imaging provides a fast and contactless way to discriminate angle closure from open angle. Although many medical image analysis algorithms have been developed for glaucoma diagnosis, only a few studies have focused on AS-OCT imaging. In particular, there is no public AS-OCT dataset available for evaluating the existing methods in a uniform way, which limits progress in the development of automated techniques for angle closure detection and assessment. To address this, we organized the Angle closure Glaucoma Evaluation challenge (AGE), held in conjunction with MICCAI 2019. The AGE challenge consisted of two tasks: scleral spur localization and angle closure classification. For this challenge, we released a large dataset of 4800 annotated AS-OCT images from 199 patients, and also proposed an evaluation framework to benchmark and compare different models. During the AGE challenge, over 200 teams registered online, and more than 1100 results were submitted for online evaluation. Finally, eight teams participated in the onsite challenge. In this paper, we summarize these eight onsite challenge methods and analyze their corresponding results for the two tasks. We further discuss limitations and future directions. In the AGE challenge, the top-performing approach had an average Euclidean Distance of 10 pixels (10um) in scleral spur localization, while in the task of angle closure classification, all the algorithms achieved satisfactory performances, with two best obtaining an accuracy rate of 100%.

preprint2020arXiv

Blast-wave description of $Υ$ elliptic flow at energies available at the CERN Large Hadron Collider

A simultaneous blast-wave fit to particle yields and elliptic flow ($v_{2}$) measured as a function of transverse momentum in Pb-Pb collisions at LHC energies is presented. A compact formula for the calculation of $v_2(p_T)$ for an elliptic freeze-out surface is used which follows from the Cooper-Frye ansatz without further assumptions. Over the full available $p_T$ range, the $Υ$ elliptic flow data is described by the prediction based on the fit to lighter particles. This prediction shows that, due to the large $Υ$ mass, a sizable elliptic flow is only expected at transverse momenta above 10 GeV/$c$.

preprint2020arXiv

Building BROOK: A Multi-modal and Facial Video Database for Human-Vehicle Interaction Research

With the growing popularity of Autonomous Vehicles, more opportunities have bloomed in the context of Human-Vehicle Interactions. However, the lack of comprehensive and concrete database support for such specific use case limits relevant studies in the whole design spaces. In this paper, we present our work-in-progress BROOK, a public multi-modal database with facial video records, which could be used to characterize drivers' affective states and driving styles. We first explain how we over-engineer such database in details, and what we have gained through a ten-month study. Then we showcase a Neural Network-based predictor, leveraging BROOK, which supports multi-modal prediction (including physiological data of heart rate and skin conductance and driving status data of speed)through facial videos. Finally, we discuss related issues when building such a database and our future directions in the context of BROOK. We believe BROOK is an essential building block for future Human-Vehicle Interaction Research.

preprint2020arXiv

Exploring and Distilling Cross-Modal Information for Image Captioning

Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. Based on the Transformer, to perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our Transformer-based model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.

preprint2020arXiv

HighwayGraph: Modelling Long-distance Node Relations for Improving General Graph Neural Network

Graph Neural Networks (GNNs) are efficient approaches to process graph-structured data. Modelling long-distance node relations is essential for GNN training and applications. However, conventional GNNs suffer from bad performance in modelling long-distance node relations due to limited-layer information propagation. Existing studies focus on building deep GNN architectures, which face the over-smoothing issue and cannot model node relations in particularly long distance. To address this issue, we propose to model long-distance node relations by simply relying on shallow GNN architectures with two solutions: (1) Implicitly modelling by learning to predict node pair relations (2) Explicitly modelling by adding edges between nodes that potentially have the same label. To combine our two solutions, we propose a model-agnostic training framework named HighwayGraph, which overcomes the challenge of insufficient labeled nodes by sampling node pairs from the training set and adopting the self-training method. Extensive experimental results show that our HighwayGraph achieves consistent and significant improvements over four representative GNNs on three benchmark datasets.

preprint2020arXiv

Investigation of the elliptic flow fluctuations of the identified particles using the A Multi-Phase Transport model

A Multi-Phase Transport (AMPT) model is used to study the elliptic flow fluctuations of identified particles using participant and spectator event planes. The elliptic flow measured using the first order spectator event plane is expected to give the elliptic flow relative to the true reaction plane which suppresses the flow fluctuations. However, the elliptic flow measured using the second-order participant plane is expected to capture the elliptic flow fluctuations. Our study shows that the first order spectator event plane could be used to study the elliptic flow fluctuations of the identified particles in the AMPT model. The elliptic flow fluctuations magnitude shows weak particle species dependence and transverse momentum dependence. Such observation will have important implications for understanding the source of the elliptic flow fluctuations.

preprint2020arXiv

Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

Target-Based Sentiment Analysis aims to detect the opinion aspects (aspect extraction) and the sentiment polarities (sentiment detection) towards them. Both the previous pipeline and integrated methods fail to precisely model the innate connection between these two objectives. In this paper, we propose a novel dynamic heterogeneous graph to jointly model the two objectives in an explicit way. Both the ordinary words and sentiment labels are treated as nodes in the heterogeneous graph, so that the aspect words can interact with the sentiment information. The graph is initialized with multiple types of dependencies, and dynamically modified during real-time prediction. Experiments on the benchmark datasets show that our model outperforms the state-of-the-art models. Further analysis demonstrates that our model obtains significant performance gain on the challenging instances under multiple-opinion aspects and no-opinion aspect situations.

preprint2020arXiv

Mining Commonsense Facts from the Physical World

Textual descriptions of the physical world implicitly mention commonsense facts, while the commonsense knowledge bases explicitly represent such facts as triples. Compared to dramatically increased text data, the coverage of existing knowledge bases is far away from completion. Most of the prior studies on populating knowledge bases mainly focus on Freebase. To automatically complete commonsense knowledge bases to improve their coverage is under-explored. In this paper, we propose a new task of mining commonsense facts from the raw text that describes the physical world. We build an effective new model that fuses information from both sequence text and existing knowledge base resource. Then we create two large annotated datasets each with approximate 200k instances for commonsense knowledge base completion. Empirical results demonstrate that our model significantly outperforms baselines.

preprint2020arXiv

Parallel Data Augmentation for Formality Style Transfer

The main barrier to progress in the task of Formality Style Transfer is the inadequacy of training data. In this paper, we study how to augment parallel data and propose novel and simple data augmentation methods for this task to obtain useful sentence pairs with easily accessible models and systems. Experiments demonstrate that our augmented parallel data largely helps improve formality style transfer when it is used to pre-train the model, leading to the state-of-the-art results in the GYAFC benchmark dataset.

preprint2020arXiv

The STAR Event Plane Detector

The Event Plane Detector (EPD) is an upgrade detector to the STAR experiment at RHIC, designed to measure the pattern of forward-going charged particles emitted in a high-energy collision between heavy nuclei. It consists of two highly-segmented disks of 1.2-cm-thick scintillator embedded with wavelength-shifting fiber, coupled to silicon photomultipliers and custom electronics. We describe the general design of the device, its construction, and performance on the bench and in the experiment.

preprint2016arXiv

A New Recurrent Neural CRF for Learning Non-linear Edge Features

Conditional Random Field (CRF) and recurrent neural models have achieved success in structured prediction. More recently, there is a marriage of CRF and recurrent neural models, so that we can gain from both non-linear dense features and globally normalized CRF objective. These recurrent neural CRF models mainly focus on encode node features in CRF undirected graphs. However, edge features prove important to CRF in structured prediction. In this work, we introduce a new recurrent neural CRF model, which learns non-linear edge features, and thus makes non-linear features encoded completely. We compare our model with different neural models in well-known structured prediction tasks. Experiments show that our model outperforms state-of-the-art methods in NP chunking, shallow parsing, Chinese word segmentation and POS tagging.

preprint2016arXiv

Derivation of Fokker-Planck equations for stochastic dynamical systems under excitation of multiplicative non-Gaussian white noise

Fokker-Planck equations describe time evolution of probability densities of stochastic dynamical systems and play an important role in quantifying propagation and evolution of uncertainty. Although Fokker-Planck equations can be written explicitly for nonlinear dynamical systems excited by Gaussian white noise, they are not available in general for nonlinear dynamical systems excited by multiplicative non-Gaussian white noise. Marcus stochastic differential equations are often appropriate models in engineering and physics for stochastic dynamical systems excited by non-Gaussian white noise. In this paper, we derive explicit forms of Fokker-Planck equations for one dimensional systems modeled by Marcus stochastic differential equations under multiplicative non-Gaussian white noise. As examples to illustrate the theoretical results, the derived formula is used to obtain Fokker-Plank equations for nonlinear dynamical systems under excitation of (i) $α$-stable white noise; (ii) combined Gaussian and Poisson white noise, respectively.

preprint2016arXiv

Fokker-Planck equations for Marcus stochastic differential equations driven by Levy processes

Marcus stochastic differential equations (SDEs) often are appropriate models for stochastic dynamical systems driven by non-Gaussian Levy processes and have wide applications in engineering and physical sciences. The probability density of the solution to an SDE offers complete statistical information on the underlying stochastic process. Explicit formula for the Fokker-Planck equation, the governing equation for the probability density, is well-known when the SDE is driven by a Brownian motion. In this paper, we address the open question of finding the Fokker-Plank equations for Marcus SDEs in arbitrary dimensions driven by non-Gaussian Levy processes. The equations are given in a simple form that facilitates theoretical analysis and numerical computation. Several examples are presented to illustrate how the theoretical results can be applied to obtain Fokker-Planck equations for Marcus SDEs driven by Levy processes.

preprint2016arXiv

Governing equations for Probability densities of stochastic differential equations with discrete time delays

The time evolution of probability densities for solutions to stochastic differential equations (SDEs) without delay is usually described by Fokker-Planck equations, which require the adjoint of the infinitesimal generator for the solutions. However, Fokker-Planck equations do not exist for stochastic delay differential equations (SDDEs) because the solutions to SDDEs are not Markov processes and have no corresponding infinitesimal generators. In this paper, we address the open question of finding the governing equations for probability densities of SDDEs with discrete time delays. The governing equation is given in a simple form that facilitates theoretical analysis and numerical computation. An illustrative example is presented to verify the proposed governing equations.

preprint2015arXiv

Elliptic and Triangular Flow of Identified Particles from the AMPT Model at RHIC Energies

The elliptic flow ($v_{2}$) at $\sqrt{s_{\rm NN}} = $ 11.5, 39, and 200 GeV and triangular flow ($v_{3}$) at $\sqrt{s_{\rm NN}} = $ 200 GeV of identified particles ($π^{\pm}, K^{\pm}, K^{0}_{S}, p, \bar{p}, ϕ, Λ$ and $\barΛ$) from 0\%--80\% central Au+Au collisions are analyzed using a multiphase transport (AMPT) model. It is shown that the experimental results from the $η$-sub event plane method can be reproduced with a parton scattering cross section between 1.5 and 3 mb. We also studied the differential and integrated $v_{2}/v_{3}$ ratios and conclude that they are anti-corrected with the parton scattering cross section.

preprint2015arXiv

Structure Regularization for Structured Prediction: Theories and Experiments

While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving state-of-the-art accuracies yet with substantially faster training speed.

preprint2014arXiv

Exact Decoding on Latent Variable Conditional Models is NP-Hard

Latent variable conditional models, including the latent conditional random fields as a special case, are popular models for many natural language processing and vision processing tasks. The computational complexity of the exact decoding/inference in latent conditional random fields is unclear. In this paper, we try to clarify the computational complexity of the exact decoding. We analyze the complexity and demonstrate that it is an NP-hard problem even on a sequential labeling setting. Furthermore, we propose the latent-dynamic inference (LDI-Naive) method and its bounded version (LDI-Bounded), which are able to perform exact-inference or almost-exact-inference by using top-$n$ search and dynamic programming.

preprint2013arXiv

State estimation under non-Gaussian Levy noise: A modified Kalman filtering method

The Kalman filter is extensively used for state estimation for linear systems under Gaussian noise. When non-Gaussian Lévy noise is present, the conventional Kalman filter may fail to be effective due to the fact that the non-Gaussian Lévy noise may have infinite variance. A modified Kalman filter for linear systems with non-Gaussian Lévy noise is devised. It works effectively with reasonable computational cost. Simulation results are presented to illustrate this non-Gaussian filtering method.

preprint2012arXiv

An alternative expression of Di Paola and Falson's formula for stochastic dynamics

Di Paola and Falsone's formula is widely used in studying stochastic dynamics of nonlinear systems under Poisson white noise. In this short communication, an alternative expression is presented. Compared to Di Paola and Falsone's original expression, the alternative one is applicable under more general condition, and shows significantly improved performance in numerical implementation. The alternative expression turns out to be a special case of the Marcus integrals.

preprint2012arXiv

Fokker-Planck equations for nonlinear dynamical systems driven by non-Gaussian Levy processes

The Fokker-Planck equations describe time evolution of probability densities of stochastic dynamical systems and are thus widely used to quantify random phenomena such as uncertainty propagation. For dynamical systems driven by non-Gaussian Lévy processes, however, it is difficult to obtain explicit forms of Fokker-Planck equations because the adjoint operators of the associated infinitesimal generators usually do not have exact formulation. In the present paper, Fokker- Planck equations are derived in terms of infinite series for nonlinear stochastic differential equations with non-Gaussian Lévy processes. A few examples are presented to illustrate the method.

preprint2012arXiv

Modeling nonlinear random vibration: Implication of the energy conservation law

Nonlinear random vibration under excitations of both Gaussian and Poisson white noises is considered. The model is based on stochastic differential equations, and the corresponding stochastic integrals are defined in such a way that the energy conservation law is satisfied. It is shown that Stratonovich integral and Di Paola-Falsone integral should be used for excitations of Gaussian and Poisson white noises, respectively, in order for the model to satisfy the underlining physical laws (e.g., energy conservation). Numerical examples are presented to illustrate the theoretical results.

preprint2011arXiv

Approximation of invariant foliations for stochastic dynamical systems

Invariant foliations are geometric structures for describing and understanding the qualitative behaviors of nonlinear dynamical systems. For stochastic dynamical systems, however, these geometric structures themselves are complicated random sets. Thus it is desirable to have some techniques to approximate random invariant foliations. In this paper, invariant foliations are approximated for dynamical systems with small noisy perturbations, via asymptotic analysis. Namely, random invariant foliations are represented as a perturbation of the deterministic invariant foliations, with deviation errors estimated.

Xu Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Adversarial Parameter Defense by Multi-Step Risk Minimization

DFTR: Depth-supervised Fusion Transformer for Salient Object Detection

Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction

Hierarchical Inductive Transfer for Continual Dialogue Learning

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

REFUGE2 Challenge: A Treasure Trove for Multi-Dimension Analysis and Evaluation in Glaucoma Screening

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

A representation formula for the probability density in stochastic dynamical systems with memory

Collaborative Group Learning

Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography

Blast-wave description of $Υ$ elliptic flow at energies available at the CERN Large Hadron Collider

Building BROOK: A Multi-modal and Facial Video Database for Human-Vehicle Interaction Research

Exploring and Distilling Cross-Modal Information for Image Captioning

HighwayGraph: Modelling Long-distance Node Relations for Improving General Graph Neural Network

Investigation of the elliptic flow fluctuations of the identified particles using the A Multi-Phase Transport model

Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

Mining Commonsense Facts from the Physical World

Parallel Data Augmentation for Formality Style Transfer

The STAR Event Plane Detector

A New Recurrent Neural CRF for Learning Non-linear Edge Features

Derivation of Fokker-Planck equations for stochastic dynamical systems under excitation of multiplicative non-Gaussian white noise

Fokker-Planck equations for Marcus stochastic differential equations driven by Levy processes

Governing equations for Probability densities of stochastic differential equations with discrete time delays

Elliptic and Triangular Flow of Identified Particles from the AMPT Model at RHIC Energies

Structure Regularization for Structured Prediction: Theories and Experiments

Exact Decoding on Latent Variable Conditional Models is NP-Hard

State estimation under non-Gaussian Levy noise: A modified Kalman filtering method

An alternative expression of Di Paola and Falson's formula for stochastic dynamics

Fokker-Planck equations for nonlinear dynamical systems driven by non-Gaussian Levy processes

Modeling nonlinear random vibration: Implication of the energy conservation law

Approximation of invariant foliations for stochastic dynamical systems