Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning

The widespread proliferation of online content has intensified concerns about clickbait, deceptive or exaggerated headlines designed to attract attention. While Large Language Models (LLMs) offer a promising avenue for addressing this issue, their effectiveness is often hindered by Sycophancy, a tendency to produce reasoning that matches users' beliefs over truthful ones, which deviates from instruction-following principles. Rather than treating sycophancy as a flaw to be eliminated, this work proposes a novel approach that initially harnesses this behavior to generate contrastive reasoning from opposing perspectives. Specifically, we design a Self-renewal Opposing-stance Reasoning Generation (SORG) framework that prompts LLMs to produce high-quality agree and disagree reasoning pairs for a given news title without requiring ground-truth labels. To utilize the generated reasoning, we develop a local Opposing Reasoning-based Clickbait Detection (ORCD) model that integrates three BERT encoders to represent the title and its associated reasoning. The model leverages contrastive learning, guided by soft labels derived from LLM-generated credibility scores, to enhance detection robustness. Experimental evaluations on three benchmark datasets demonstrate that our method consistently outperforms LLM prompting, fine-tuned smaller language models, and state-of-the-art clickbait detection baselines.

preprint2026arXiv

Alethia: A Foundational Encoder for Voice Deepfakes

Existing voice deepfake detection and localization models rely heavily on representations extracted from speech foundation models (SFMs). However, downstream finetuning has now reached a state of diminishing returns. In this paper, we shift the focus to pretraining and propose a novel recipe that combines bottleneck masked embedding prediction with flow-matching based spectrogram reconstruction. The outcome, Alethia, is the first foundational audio encoder for various voice deepfake detection and localization tasks. We evaluate on $5$ different tasks with $56$ benchmark datasets, and note Alethia significantly outperforms state-of-the-art SFMs with superior robustness to real-world perturbations and zero-shot generalization to unseen domains (e.g., singing deepfakes). We also demonstrate the limitation of discrete targets in masked token prediction, and show the importance of continuous embedding prediction and generative pretraining for capturing deepfake artifacts.

preprint2023arXiv

MixGen: A New Multi-Modal Data Augmentation

Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-language pre-training, data is only augmented either for images or for text in previous works. In this paper, we present MixGen: a joint data augmentation for vision-language representation learning to further improve data efficiency. It generates new image-text pairs with semantic relationships preserved by interpolating images and concatenating text. It's simple, and can be plug-and-played into existing pipelines. We evaluate MixGen on four architectures, including CLIP, ViLT, ALBEF and TCL, across five downstream vision-language tasks to show its versatility and effectiveness. For example, adding MixGen in ALBEF pre-training leads to absolute performance improvements on downstream tasks: image-text retrieval (+6.2% on COCO fine-tuned and +5.3% on Flicker30K zero-shot), visual grounding (+0.9% on RefCOCO+), visual reasoning (+$0.9% on NLVR2), visual question answering (+0.3% on VQA2.0), and visual entailment (+0.4% on SNLI-VE).

preprint2022arXiv

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i.e., make instruction-asked actions sequentially in complex visual environments. Most existing VLN agents learn the instruction-path data directly and cannot sufficiently explore action-level alignment knowledge inside the multi-modal inputs. In this paper, we propose modAlity-aligneD Action PrompTs (ADAPT), which provides the VLN agent with action prompts to enable the explicit learning of action-level modality alignment to pursue successful navigation. Specifically, an action prompt is defined as a modality-aligned pair of an image sub-prompt and a text sub-prompt, where the former is a single-view observation and the latter is a phrase like ''walk past the chair''. When starting navigation, the instruction-related action prompt set is retrieved from a pre-built action prompt base and passed through a prompt encoder to obtain the prompt feature. Then the prompt feature is concatenated with the original instruction feature and fed to a multi-layer transformer for action prediction. To collect high-quality action prompts into the prompt base, we use the Contrastive Language-Image Pretraining (CLIP) model which has powerful cross-modality alignment ability. A modality alignment loss and a sequential consistency loss are further introduced to enhance the alignment of the action prompt and enforce the agent to focus on the related prompt sequentially. Experimental results on both R2R and RxR show the superiority of ADAPT over state-of-the-art methods.

preprint2022arXiv

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. Specifically, we generate a new taxonomy which unifies the heterogeneous label spaces from different sources. Our BigDetection dataset has 600 object categories and contains over 3.4M training images with 36M bounding boxes. It is much larger in multiple dimensions than previous benchmarks, which offers both opportunities and challenges. Extensive experiments demonstrate its validity as a new benchmark for evaluating different object detection methods, and its effectiveness as a pre-training dataset.

preprint2022arXiv

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust SLU techniques.

preprint2022arXiv

Chinese Idiom Paraphrasing

Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters. Due to the properties of non-compositionality and metaphorical meaning, Chinese Idioms are hard to be understood by children and non-native speakers. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). CIP aims to rephrase idioms-included sentences to non-idiomatic ones under the premise of preserving the original sentence's meaning. Since the sentences without idioms are easier handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation system, Chinese idiom cloze, and Chinese idiom embeddings. In this study, CIP task is treated as a special paraphrase generation task. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,530 sentence pairs. We further deploy three baselines and two novel CIP approaches to deal with CIP problems. The results show that the proposed methods have better performances than the baselines based on the established CIP dataset.

preprint2022arXiv

Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue

Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems. Previous studies indicate that multimodal cues can facilitate this challenging task. However, due to the paucity of public multimodal datasets, current methods are mostly limited to either utilizing unimodal features or simplistic multimodal ensemble models. Besides, the inherent class imbalance in real scenario, e.g. sentence ending with short pause will be mostly regarded as the end of turn, also poses great challenge to the turn-taking decision. In this paper, we first collect a large-scale annotated corpus for turn-taking with over 5,000 real human-robot dialogues in speech and text modalities. Then, a novel gated multimodal fusion mechanism is devised to utilize various information seamlessly for turn-taking prediction. More importantly, to tackle the data imbalance issue, we design a simple yet effective data augmentation method to construct negative instances without supervision and apply contrastive learning to obtain better feature representations. Extensive experiments are conducted and the results demonstrate the superiority and competitiveness of our model over several state-of-the-art baselines.

preprint2022arXiv

Global classical solutions of 3D compressible viscoelastic system near equilibrium

In this paper, we prove the global existence of general small solutions to compressible viscoelastic system. We remove the "initial state" assumption ($\tilde ρ_0 \det F_0 =1$) and the "div-curl" structure assumption compared with previous works. It then broadens the class of solutions to a great extent, more precisely the initial density state would not be constant necessarily, and no more structure is need for global well-posedness. It's quite different from the elasticity system in which structure plays an important role. Since we can not obtain any dissipation information for density and deformation tensor, we introduce a new effective flux in the thought of regarding the wildest "nonlinear term" as "linear term". Although the norms of solution may increase now, we can still derive the global existence for it.

preprint2022arXiv

Global well-posedness for 2D non-resistive compressible MHD system in periodic domain

This paper focuses on the 2D compressible magnetohydrodynamic (MHD) equations without magnetic diffusion in a periodic domain. We present a systematic approach to establishing the global existence of smooth solutions when the initial data is close to a background magnetic field. In addition, stability and large-time decay rates are also obtained. When there is no magnetic diffusion, the magnetic field and the density are governed by forced transport equations and the problem considered here is difficult. This paper implements several key observations and ideas to maximize the enhanced dissipation due to hidden structures and interactions. In particular, the weak smoothing and stabilization generated by the background magnetic field and the extra regularization in the divergence part of the velocity field are fully exploited. Compared with the previous works, this paper appears to be the first to investigate such system on bounded domains and the first to solve this problem by pure energy estimates, which help reduce the complexity in other approaches. In addition, this paper combines the well-posedness with the precise large-time behavior, a strategy that can be extended to higher dimensions.

preprint2022arXiv

ImpDet: Exploring Implicit Fields for 3D Object Detection

Conventional 3D object detection approaches concentrate on bounding boxes representation learning with several parameters, i.e., localization, dimension, and orientation. Despite its popularity and universality, such a straightforward paradigm is sensitive to slight numerical deviations, especially in localization. By exploiting the property that point clouds are naturally captured on the surface of objects along with accurate location and intensity information, we introduce a new perspective that views bounding box regression as an implicit function. This leads to our proposed framework, termed Implicit Detection or ImpDet, which leverages implicit field learning for 3D object detection. Our ImpDet assigns specific values to points in different local 3D spaces, thereby high-quality boundaries can be generated by classifying points inside or outside the boundary. To solve the problem of sparsity on the object surface, we further present a simple yet efficient virtual sampling strategy to not only fill the empty region, but also learn rich semantic features to help refine the boundaries. Extensive experimental results on KITTI and Waymo benchmarks demonstrate the effectiveness and robustness of unifying implicit fields into object detection.

preprint2022arXiv

Large-scale Dynamics of Winds Driven by Line Force from a Thin Accretion Disk

Winds play a significant role in active galactic nuclei feedback process. Previous simulations studying winds only focus on a small dynamical range. Therefore, it is unknown how far the winds can go and what the properties of the winds will be if they can move to large radii. We perform simulations to study the large scale dynamics of winds driven by line force. We find that the properties of the winds depend on both black hole mass ($M_{BH}$) and accretion disk luminosity. When the accretion disk luminosity is $0.6L_{edd}$ ($L_{edd}$ being Eddington luminosity), independent of $M_{BH}$, the winds have kinetic energy flux exceeding $1\% L_{edd}$ and can escape from the black hole potential. For the case with the accretion disk luminosity equaling 0.3$L_{edd}$, the strength of the winds decreases with the decrease of $M_{BH}$. If $M_{BH}$ decreases from $10^9$ to $10^6$ solar mass ($M_\odot$), the winds kinetic energy flux decreases from $\sim 0.01 L_{edd}$ to $ \sim 10^{-6} L_{edd}$. In case of $M_{BH}\geq 10^7 M_\odot$, winds can escape from black hole potential. In the case of $M_{BH}=10^6 M_\odot$, the winds can not escape. We find that for the ultra-fast winds observed in hard X-ray bands (\citealt{Gofford et al. 2015}), the observed dependence of the mass flux and the kinetic energy flux on accretion disk luminosity can be well produced by line force driven winds model. We also find that the properties of the ultra-fast winds observed in soft X-ray bands can be explained by the line force driven winds model.

preprint2022arXiv

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2022arXiv

Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition

Existing out-of-distribution (OOD) detection methods are typically benchmarked on training sets with balanced class distributions. However, in real-world applications, it is common for the training sets to have long-tailed distributions. In this work, we first demonstrate that existing OOD detection methods commonly suffer from significant performance degradation when the training set is long-tail distributed. Through analysis, we posit that this is because the models struggle to distinguish the minority tail-class in-distribution samples, from the true OOD samples, making the tail classes more prone to be falsely detected as OOD. To solve this problem, we propose Partial and Asymmetric Supervised Contrastive Learning (PASCL), which explicitly encourages the model to distinguish between tail-class in-distribution samples and OOD samples. To further boost in-distribution classification accuracy, we propose Auxiliary Branch Finetuning, which uses two separate branches of BN and classification layers for anomaly detection and in-distribution classification, respectively. The intuition is that in-distribution and OOD anomaly data have different underlying distributions. Our method outperforms previous state-of-the-art method by $1.29\%$, $1.45\%$, $0.69\%$ anomaly detection false positive rate (FPR) and $3.24\%$, $4.06\%$, $7.89\%$ in-distribution classification accuracy on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT, respectively. Code and pre-trained models are available at https://github.com/amazon-research/long-tailed-ood-detection.

preprint2022arXiv

Pixel-level Correspondence for Self-Supervised Learning from Video

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

preprint2022arXiv

Prompt-Learning for Short Text Classification

In the short text, the extremely short length, feature sparsity, and high ambiguity pose huge challenges to classification tasks. Recently, as an effective method for tuning Pre-trained Language Models for specific downstream tasks, prompt-learning has attracted a vast amount of attention and research. The main intuition behind the prompt-learning is to insert the template into the input and convert the text classification tasks into equivalent cloze-style tasks. However, most prompt-learning methods expand label words manually or only consider the class name for knowledge incorporating in cloze-style prediction, which will inevitably incur omissions and bias in short text classification tasks. In this paper, we propose a simple short text classification approach that makes use of prompt-learning based on knowledgeable expansion. Taking the special characteristics of short text into consideration, the method can consider both the short text itself and class name during expanding label words space. Specifically, the top $N$ concepts related to the entity in the short text are retrieved from the open Knowledge Graph like Probase, and we further refine the expanded label words by the distance calculation between selected concepts and class labels. Experimental results show that our approach obtains obvious improvement compared with other fine-tuning, prompt-learning, and knowledgeable prompt-tuning methods, outperforming the state-of-the-art by up to 6 Accuracy points on three well-known datasets.

preprint2022arXiv

Traveling edge states in massive Dirac equations along slowly varying edges

Topologically protected wave motion has attracted considerable interest due to its novel properties and potential applications in many different fields. In this work, we study edge modes and traveling edge states via the linear Dirac equations with so-called domain wall masses. The unidirectional edge state provides a heuristic approach to more general traveling edge states through the localized behavior along slowly varying edges. We show the leading asymptotic solutions of two typical edge states that follow the circular and curved edges with small curvature by analytic and quantitative arguments.

preprint2021arXiv

A phase field model for mass transport with semi-permeable interfaces

In this paper, a thermal-dynamical consistent model for mass transfer across permeable moving interfaces is proposed by using the energy variation method. We consider a restricted diffusion problem where the flux across the interface depends on its conductance and the difference of the concentration on each side. The diffusive interface phase-field framework used here has several advantages over the sharp interface method. First of all, explicit tracking of the interface is no longer necessary. Secondly, the interfacial condition can be incorporated with a variable diffusion coefficient. A detailed asymptotic analysis confirms the diffusive interface model converges to the existing sharp interface model as the interface thickness goes to zero. A decoupled energy stable numerical scheme is developed to solve this system efficiently. Numerical simulations first illustrate the consistency of theoretical results on the sharp interface limit. Then a convergence study and energy decay test are conducted to ensure the efficiency and stability of the numerical scheme. To illustrate the effectiveness of our phase-field approach, several examples are provided, including a study of a two-phase mass transfer problem where drops with deformable interfaces are suspended in a moving fluid.

preprint2021arXiv

Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification

Semi-supervised learning through deep generative models and multi-lingual pretraining techniques have orchestrated tremendous success across different areas of NLP. Nonetheless, their development has happened in isolation, while the combination of both could potentially be effective for tackling task-specific labelled data shortage. To bridge this gap, we combine semi-supervised deep generative models and multi-lingual pretraining to form a pipeline for document classification task. Compared to strong supervised learning baselines, our semi-supervised classification framework is highly competitive and outperforms the state-of-the-art counterparts in low-resource settings across several languages.

preprint2021arXiv

Rapid Multi-Physics Simulation for Electro-Thermal Origami Systems

Electro-thermally actuated origami provides a novel method for creating 3-D systems with advanced morphing and functional capabilities. However, it is currently difficult to simulate the multi-physical behavior of such systems because the electro-thermal actuation and large folding deformations are highly interdependent. In this work, we introduce a rapid multi-physics simulation framework for electro-thermally actuated origami systems that can simultaneously capture: thermo-mechancially coupled actuation, inter panel contact, heat transfer, large deformation folding, and other complex loading applied onto the origami. Comparisons with finite element models validate the proposed framework for simulating origami heat transfer with different system geometries, materials, and surrounding environments. Verification of the simulated folding behaviors against physical electro-thermal micro-origami further demonstrates the validity of the proposed model. Simulations of more complex origami patterns and a case study for origami optimization are provided as application examples to show the capability and efficiency of the model. The framework provides a novel simulation tool for analysis, design, control, and optimization of active origami systems, pushing the boundary for feasible shape morphing and functional capability.

preprint2021arXiv

Three-fold Weyl points in the Schrödinger operator with periodic potentials

Weyl points are degenerate points on the spectral bands at which energy bands intersect conically. They are the origins of many novel physical phenomena and have attracted much attention recently. In this paper, we investigate the existence of such points in the spectrum of the 3-dimensional Schrödinger operator $H = - Δ+V(\textbf{x})$ with $V(\textbf{x})$ being in a large class of periodic potentials. Specifically, we give very general conditions on the potentials which ensure the existence of 3-fold Weyl points on the associated energy bands. Different from 2-dimensional honeycomb structures which possess Dirac points where two adjacent band surfaces touch each other conically, the 3-fold Weyl points are conically intersection points of two energy bands with an extra band sandwiched in between. To ensure the 3-fold and 3-dimensional conical structures, more delicate, new symmetries are required. As a consequence, new techniques combining more symmetries are used to justify the existence of such conical points under the conditions proposed. This paper provides comprehensive proof of such 3-fold Weyl points. In particular, the role of each symmetry endowed to the potential is carefully analyzed. Our proof extends the analysis on the conical spectral points to a higher dimension and higher multiplicities. We also provide some numerical simulations on typical potentials to demonstrate our analysis.

preprint2020arXiv

Direct Measurement of Folding Angle and Strain Vector in Atomically thin WS$_2$ using Second Harmonic Generation

Structural engineering techniques such as local strain engineering and folding provide functional control over critical optoelectronic properties of 2D materials. Accurate monitoring of local strain vector (both strain amplitude and direction) and folding angle in 2D materials is important to optimize the device performance. Conventionally, the accurate measurement of both strain amplitude and direction requires the combined usage of multiple tools, such as atomic force microscopy (AFM), electron microscopy, Raman spectroscopy, etc. Here, we demonstrated the usage of a single tool, polarization-dependent second harmonic generation (SHG) imaging, to determine the folding angle and strain vector accurately in atomically thin tungsten disulfide (WS2). We find that trilayer WS2 folds with folding angle of 600 show 9 times SHG enhancement due to vector superposition of SH wave vectors coming from the individual folding layers. Strain dependent SHG quenching and enhancement is found parallel and perpendicular respectively to the direction of the compressive strain vector. However, despite a variation in strain angle, the total SHG remains constant which allows us to determine the local strain vector accurately using photoelastic approach. We also demonstrate that band-nesting induced transition (C peak) can highly enhance SHG, which can be significantly modulated by strain. Our results would pave the way to enable novel applications of the TMDs in nonlinear optical device.

preprint2020arXiv

Global existence in critical spaces for non Newtonian compressible viscoelastic flows

We are interested in the multi-dimentional compressible viscoelastic flows of Oldroyd type, which is one of non-Newtonian fluids exhibiting the elastic behavior. In order to capture the damping effect of the additional deformation tensor, to the best of our knowledge, the "div-curl" structural condition plays a key role in previous efforts. Our aim of this paper is to remove the structural condition and prove a global existence of strong solutions to compressible viscoelastic flows in critical spaces. The new ingredient lies in the introduction of effective flux $(θ,\mathcal{G})$, which enables us to capture the dissipation arising from \textit{combination} of density and deformation tensor. In absence of compatible conditions, the partial dissipation is found in non-Newtonian compressible fluids, which is weaker than that of usual Navier-Stokes equations.

preprint2020arXiv

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.

preprint2020arXiv

Identification of hydrodynamic instability by convolutional neural networks

The onset of hydrodynamic instabilities is of great importance in both industry and daily life, due to the dramatic mechanical and thermodynamic changes for different types of flow motions. In this paper, modern machine learning techniques, especially the convolutional neural networks (CNN), are applied to identify the transition between different flow motions raised by hydrodynamic instability, as well as critical non-dimensionalized parameters for characterizing this transit. CNN not only correctly predicts the critical transition values for both Taylor-Couette (TC) flow and Rayleigh- Bénard (RB) convection under various setups and conditions, but also shows an outstanding performance on robustness and noise-tolerance. In addition, key spatial features used for classifying different flow patterns are revealed by the principal component analysis.

preprint2020arXiv

Improving Semantic Segmentation via Self-Training

Deep learning usually achieves the best results with complete supervision. In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models. In this paper, we show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm. We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data. Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets while requiring significantly less supervision. We also demonstrate the effectiveness of self-training on a challenging cross-domain generalization task, outperforming conventional finetuning method by a large margin. Lastly, to alleviate the computational burden caused by the large amount of pseudo labels, we propose a fast training schedule to accelerate the training of segmentation models by up to 2x without performance degradation.

preprint2020arXiv

LSBert: A Simple Framework for Lexical Simplification

Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning, to simplify the sentence. Recently unsupervised lexical simplification approaches only rely on the complex word itself regardless of the given sentence to generate candidate substitutions, which will inevitably produce a large number of spurious candidates. In this paper, we propose a lexical simplification framework LSBert based on pretrained representation model Bert, that is capable of (1) making use of the wider context when both detecting the words in need of simplification and generating substitue candidates, and (2) taking five high-quality features into account for ranking candidates, including Bert prediction order, Bert-based language model, and the paraphrase database PPDB, in addition to the word frequency and word similarity commonly used in other LS methods. We show that our system outputs lexical simplifications that are grammatically correct and semantically appropriate, and obtains obvious improvement compared with these baselines, outperforming the state-of-the-art by 29.8 Accuracy points on three well-known benchmarks.

preprint2020arXiv

On Constructing Confidence Region for Model Parameters in Stochastic Gradient Descent via Batch Means

In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method. The main idea is to cancel out the covariance matrix which is hard/costly to estimate. In the process of developing the algorithm, we establish process-level functional central limit theorem for Polyak-Ruppert averaging based stochastic gradient descent estimators. We also extend the batch means method to accommodate more general batch size specifications.

preprint2020arXiv

ResNeSt: Split-Attention Networks

It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.

preprint2020arXiv

Super-transport of Excitons in Atomically Thin Organic Semiconductors at the 2D Quantum Limit

Long-range and fast transport of coherent excitons is important for development of high-speed excitonic circuits and quantum computing applications. However, most of these coherent excitons have only been observed in some low-dimensional semiconductors when coupled with cavities, as there are large inhomogeneous broadening and dephasing effects on the exciton transport in their native states of the materials. Here, by confining coherent excitons at the 2D quantum limit, we firstly observed molecular aggregation enabled super-transport of excitons in atomically thin two-dimensional (2D) organic semiconductors between coherent states, with a measured a high effective exciton diffusion coefficient of 346.9 cm2/sec at room temperature. This value is one to several orders of magnitude higher than the reported values from other organic molecular aggregates and low-dimensional inorganic materials. Without coupling to any optical cavities, the monolayer pentacene sample, a very clean 2D quantum system (1.2 nm thick) with high crystallinity (J type aggregation) and minimal interfacial states, showed superradiant emissions from the Frenkel excitons, which was experimentally confirmed by the temperature-dependent photoluminescence (PL) emission, highly enhanced radiative decay rate, significantly narrowed PL peak width and strongly directional in-plane emission. The coherence in monolayer pentacene samples was observed to be delocalized over 135 molecules, which is significantly larger than the values (a few molecules) observed from other organic thin films. In addition, the super-transport of excitons in monolayer pentacene samples showed highly anisotropic behaviour. Our results pave the way for the development of future high-speed excitonic circuits, fast OLEDs, and other opto-electronic devices.

preprint2020arXiv

Testing kinetically coupled inflation models with CMB distortions

Inflation scenarios kinetically coupled with the Einstein tensor have been widely studied. They can be consistent with current observational data. Future experiments on the measurement on CMB distortions will potentially extend information about the scalar spectrum to small scales $1 \Mpc^{-1} \lesssim k \lesssim 2 \times 10^4 \Mpc^{-1}$. By taking the sensitivity of the PIXIE experiment as the criterion, we perform a model-oriented analysis of the observational prospects of spectral distortions for kinetically coupled inflation. There are five models that possibly generate a detectable level of distortions, among the 49 single-field inflation models listed in Ref. \cite{Martin2013a}. These models are: hybrid inflation in the valley (VHI), non-canonical Kähler inflation (NCKI), generalized MSSM inflation (GMSSMI), generalized renormalization point inflation (GRIPI), and running-mass inflation (RMI). Each of these models can satisfy the Planck constraints on spectral tilt and lead to increase power on scales relevant for CMB distortions in a tuned region of their parameter space. The existence of kinetic coupling suppresses the value of the model parameters with mass dimension for VHI, GMSSMI, and GRIPI, such that these three models can be in agreement with their theoretical considerations. However, the tuned regions for all these models fail to satisfy the constraints on tensor modes.

preprint2020arXiv

Unfitted Nitsche's method for computing wave modes in topological materials

In this paper, we propose an unfitted Nitsche's method for computing wave modes in topological materials. The proposed method is based on Nitsche's technique to study the performance-enhanced topological materials which have strongly heterogeneous structures (e.g., the refractive index is piecewise constant with high contrasts). For periodic bulk materials, we use Floquet-Bloch theory and solve an eigenvalue problem on a torus with unfitted meshes. For the materials with a line defect, a sufficiently large domain with zero boundary conditions is used to compute the localized eigenfunctions corresponding to the edge modes. The interfaces are handled by Nitsche's method on an unfitted uniform mesh. We prove the proposed methods converge optimally, and present numerical examples to validate the theoretical results and demonstrate the capability of simulating topological materials.

preprint2020arXiv

Vision-Dialog Navigation by Exploring Cross-modal Memory

Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets at learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses. Besides the common challenges faced in visual language navigation, vision-dialog navigation also requires to handle well with the language intentions of a series of questions about the temporal context from dialogue history and co-reasoning both dialogs and visual scenes. In this paper, we propose the Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions. Our CMN consists of two memory modules, the language memory module (L-mem) and the visual memory module (V-mem). Specifically, L-mem learns latent relationships between the current language interaction and a dialog history by employing a multi-head attention mechanism. V-mem learns to associate the current visual views and the cross-modal memory about the previous navigation actions. The cross-modal memory is generated via a vision-to-language attention and a language-to-vision attention. Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step. Experiments on the CVDN dataset show that our CMN outperforms the previous state-of-the-art model by a significant margin on both seen and unseen environments.

preprint2020arXiv

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit the vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have neglected the rich semantic information contained in the environment (such as implicit navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, estimating the navigation progress, predicting the next orientation, and evaluating the trajectory consistency. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activity and build a thorough perception of the environment. Our experiments indicate that auxiliary reasoning tasks improve both the performance of the main task and the model generalizability by a large margin. Empirically, we demonstrate that an agent trained with self-supervised auxiliary reasoning tasks substantially outperforms the previous state-of-the-art method, being the best existing approach on the standard benchmark.

preprint2020arXiv

Wave packets in the fractional nonlinear Schrödinger equation with a honeycomb potential

In this article, we study wave dynamics in the fractional nonlinear Schrödinger equation with a modulated honeycomb potential. This problem arises from recent research interests in the interplay between topological materials and nonlocal governing equations. Both are current focuses in scientific research fields. We first develop the Floquet-Bloch spectral theory of the linear fractional Schrödinger operator with a honeycomb potential. Especially, we prove the existence of conical degenerate points, i.e., Dirac points, at which two dispersion band functions intersect. We then investigate the dynamics of wave packets spectrally localized at a Dirac point and derive the leading effective envelope equation. It turns out the envelope can be described by a nonlinear Dirac equation with a varying mass. With rigorous error estimates, we demonstrate that the asymptotic solution based on the effective envelope equation approximates the true solution well in the weighted-$H^s$ space.

preprint2020arXiv

When Machine Learning Meets Multiscale Modeling in Chemical Reactions

Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties. In this study, through two concrete examples with biological background, we illustrate how the key ideas of multiscale modeling can help to reduce the computational cost of machine learning a lot, as well as how machine learning algorithms perform model reduction automatically in a time-scale separated system. Our study highlights the necessity and effectiveness of an integration of machine learning algorithms and multiscale modeling during the study of chemical reactions.

preprint2019arXiv

Wave packet dynamics in slowly modulated photonic graphene

Mathematical analysis on electromagnetic waves in photonic graphene, a photonic topological material which has a honeycomb structure, is one of the most important current research topics. By modulating the honeycomb structure, numerous topological phenomena have been observed recently. The electromagnetic waves in such a media are generally described by the 2-dimensional wave equation. It has been shown that the corresponding elliptic operator with a honeycomb material weight has Dirac points in its dispersion surfaces. In this paper, we study the time evolution of the wave packets spectrally concentrated at such Dirac points in a modulated honeycomb material weight. We prove that such wave packet dynamics is governed by the Dirac equation with a varying mass in a large but finite time. Our analysis provides mathematical insights to those topological phenomena in photonic graphene.