Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
46works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

46 published item(s)

preprint2026arXiv

Simply Stabilizing the Loop via Fully Looped Transformer

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading additional computation for improved performance without increasing parameter count or context length. Because the number of loop iterations can be adjusted at inference, it also provides a natural mechanism for balancing performance and test-time compute. However, Looped Transformer still suffers from training instability when the number of loop iterations increases. Our analysis reveals that this instability stems from two sources: gradient oscillation and residual explosion. To address these two problems, we propose the Fully Looped Transformer, which introduces two parameter-free modifications: (1) Fully Looped Architecture, which distributes inter-loop signals across all layers to mitigate residual explosion; (2) Attention Injection, which reuses the existing attention block to suppress gradient oscillation. These modifications stabilize training dynamics, enabling the Fully Looped Transformer to be trained stably up to 12 loop iterations, whereas other baseline looped models collapse in this regime. In milder settings where Looped Transformer does not collapse, Fully Looped Transformer still improves average downstream-task performance by up to 13.2\%. Overall, our experiments demonstrate that Fully Looped Transformer improves training stability, enhances downstream performance, and provides preliminary adaptability under different test-time compute budgets by varying loop iterations at inference.

preprint2023arXiv

Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning

Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. However, identifying novel drug combinations through wet-lab experiments is resource intensive due to the vast combinatorial search space. Recently, computational approaches, specifically deep learning models have emerged as an efficient way to discover synergistic combinations. While previous methods reported fair performance, their models usually do not take advantage of multi-modal data and they are unable to handle new drugs or cell lines. In this study, we collected data from various datasets covering various drug-related aspects. Then, we take advantage of large-scale pre-training models to generate informative representations and features for drugs, proteins, and diseases. Based on that, a message-passing graph is built on top to propagate information together with graph structure learning flexibility. This is first introduced in the biological networks and enables us to generate pseudo-relations in the graph. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods on synergistic prediction benchmark datasets. We are also capable of inferencing new drug combination data in a test on an independent set released by AstraZeneca, where 10% of improvement over previous methods is observed. In addition, we're robust against unseen drugs and surpass almost 15% AU ROC compared to the second-best model. We believe our framework contributes to both the future wet-lab discovery of novel drugs and the building of promising guidance for precise combination medicine.

preprint2022arXiv

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While interactive learning and reinforcement learning algorithms can be applied to incorporate external data sources of real users and user simulators, they take significant manual effort to build and suffer from instability. To address these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics to enhance multi-action prediction. Our PEDP method employs model-based planning for conceiving what to express before deciding the current response through simulating single-action dialogs. Experimental results on the MultiWOZ dataset demonstrate that our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.

preprint2022arXiv

A physics and data co-driven surrogate modeling approach for temperature field prediction on irregular geometric domain

In the whole aircraft structural optimization loop, thermal analysis plays a very important role. But it faces a severe computational burden when directly applying traditional numerical analysis tools, especially when each optimization involves repetitive parameter modification and thermal analysis followed. Recently, with the fast development of deep learning, several Convolutional Neural Network (CNN) surrogate models have been introduced to overcome this obstacle. However, for temperature field prediction on irregular geometric domains (TFP-IGD), CNN can hardly be competent since most of them stem from processing for regular images. To alleviate this difficulty, we propose a novel physics and data co-driven surrogate modeling method. First, after adapting the Bezier curve in geometric parameterization, a body-fitted coordinate mapping is introduced to generate coordinate transforms between the irregular physical plane and regular computational plane. Second, a physics-driven CNN surrogate with partial differential equation (PDE) residuals as a loss function is utilized for fast meshing (meshing surrogate); then, we present a data-driven surrogate model based on the multi-level reduced-order method, aiming to learn solutions of temperature field in the above regular computational plane (thermal surrogate). Finally, combining the grid position information provided by the meshing surrogate with the scalar temperature field information provided by the thermal surrogate (combined model), we reach an end-to-end surrogate model from geometric parameters to temperature field prediction on an irregular geometric domain. Numerical results demonstrate that our method can significantly improve accuracy prediction on a smaller dataset while reducing the training time when compared with other CNN methods.

preprint2022arXiv

A Virtual Reality-based Training and Assessment System for Bridge Inspectors with an Assistant Drone

Over 600,000 bridges in the U.S. must be inspected every two years to identify flaws, defects, or potential problems that may need follow-up maintenance. Bridge inspection has adopted unmanned aerial vehicles (or drones) for improving safety, efficiency, and cost-effectiveness. Although drones can operate in an autonomous mode, keeping inspectors in the loop is critical for complex tasks in bridge inspection. Therefore, inspectors need to develop the skill and confidence to operate drones in their jobs. This paper presents the design and development of a virtual reality-based training and assessment system for inspectors assisted by a drone in bridge inspection. The system is composed of four integrated modules: a simulated bridge inspection developed in Unity, an interface that allows a trainee to operate the drone in simulation using a remote controller, data monitoring and analysis to provide real-time, in-task feedback to trainees to assist their learning, and a post-study assessment supporting personalized training. The paper also conducts a proof-of-concept pilot study to illustrate the functionality of this system. The study demonstrated that TASBID, as a tool for the early-stage training, can objectively identify the training needs of individuals in detail and, further, help them develop the skill and confidence in collaborating with a drone in bridge inspection. The system has built a modeling and analysis platform for exploring advanced solutions to the human-drone cooperative inspection of civil infrastructure.

preprint2022arXiv

Active Learning for Open-set Annotation

Existing active learning studies typically work in the closed-set setting by assuming that all data examples to be labeled are drawn from known classes. However, in real annotation tasks, the unlabeled data usually contains a large amount of examples from unknown classes, resulting in the failure of most active learning methods. To tackle this open-set annotation (OSA) problem, we propose a new active learning framework called LfOSA, which boosts the classification performance with an effective sampling strategy to precisely detect examples from known classes for annotation. The LfOSA framework introduces an auxiliary network to model the per-example max activation value (MAV) distribution with a Gaussian Mixture Model, which can dynamically select the examples with highest probability from known classes in the unlabeled set. Moreover, by reducing the temperature $T$ of the loss function, the detection model will be further optimized by exploiting both known and unknown supervision. The experimental results show that the proposed method can significantly improve the selection quality of known classes, and achieve higher classification accuracy with lower annotation cost than state-of-the-art active learning methods. To the best of our knowledge, this is the first work of active learning for open-set annotation.

preprint2022arXiv

Enhanced two-component superconductivity in CoSi2/TiSi2 heterojunctions

We report enhanced two-component superconductivity in (CoSi2/Si)/TiSi2 superconductor/normal-metal (S/N) heterojunctions. An enhanced superconducting transition temperature about twice that of CoSi2 and an upper critical field about 20 times bigger than that of epitaxial CoSi2/Si films were found. The tunneling spectra of three-terminal S/N junctions show pronounced zero-bias conductance peaks (ZBCPs) that signify penetration of odd-frequency, spin-triplet and even-parity Cooper pairs in TiSi2 from triplet dominant pairing in CoSi2/Si driven by symmetry reduction at the CoSi2/Si interface. Both the enhancement of the superconducting transition temperature and the ZBCPs are found to be more pronounced if TiSi2 is made more diffusive.

preprint2022arXiv

Genome-wide nucleotide-resolution model of single-strand break site reveals species evolutionary hierarchy

Single-strand breaks (SSBs) are the major DNA damage in the genome arising spontaneously as the outcome of genotoxins and intermediates of DNA transactions. SSBs play a crucial role in various biological processes and show a non-random distribution in the genome. Several SSB detection approaches such as S1 END-seq and SSiNGLe-ILM emerged to characterize the genomic landscape of SSB with nucleotide resolution. However, these sequencing-based methods are costly and unfeasible for large-scale analysis of diverse species. Thus, we proposed the first computational approach, SSBlazer, which is an explainable and scalable deep learning framework for genome-wide nucleotide-resolution SSB site prediction. We demonstrated that SSBlazer can accurately predict SSB sites and sufficiently alleviate false positives by constructing an imbalanced dataset to simulate the realistic SSB distribution. The model interpretation analysis reveals that SSBlazer captures the pattern of individual CpG in genomic context and the motif of TGCC in the center region as critical features. Besides, SSBlazer is a lightweight model with robust cross-species generalization ability in the cross-species evaluation, which enables the large-scale genome-wide application in diverse species. Strikingly, the putative SSB genomic landscapes of 216 vertebrates reveal a negative correlation between SSB frequency and evolutionary hierarchy, suggesting that the genome tends to be integrity during evolution.

preprint2022arXiv

Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval

The task of hot-refresh model upgrades of image retrieval systems plays an essential role in the industry but has never been investigated in academia before. Conventional cold-refresh model upgrades can only deploy new models after the gallery is overall backfilled, taking weeks or even months for massive data. In contrast, hot-refresh model upgrades deploy the new model immediately and then gradually improve the retrieval accuracy by backfilling the gallery on-the-fly. Compatible training has made it possible, however, the problem of model regression with negative flips poses a great challenge to the stable improvement of user experience. We argue that it is mainly due to the fact that new-to-old positive query-gallery pairs may show less similarity than new-to-new negative pairs. To solve the problem, we introduce a Regression-Alleviating Compatible Training (RACT) method to properly constrain the feature compatibility while reducing negative flips. The core is to encourage the new-to-old positive pairs to be more similar than both the new-to-old negative pairs and the new-to-new negative pairs. An efficient uncertainty-based backfilling strategy is further introduced to fasten accuracy improvements. Extensive experiments on large-scale retrieval benchmarks (e.g., Google Landmark) demonstrate that our RACT effectively alleviates the model regression for one more step towards seamless model upgrades. The code will be available at https://github.com/binjiezhang/RACT_ICLR2022.

preprint2022arXiv

Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions

Non-coding RNA structure and function are essential to understanding various biological processes, such as cell signaling, gene expression, and post-transcriptional regulations. These are all among the core problems in the RNA field. With the rapid growth of sequencing technology, we have accumulated a massive amount of unannotated RNA sequences. On the other hand, expensive experimental observatory results in only limited numbers of annotated data and 3D structures. Hence, it is still challenging to design computational methods for predicting their structures and functions. The lack of annotated data and systematic study causes inferior performance. To resolve the issue, we propose a novel RNA foundation model (RNA-FM) to take advantage of all the 23 million non-coding RNA sequences through self-supervised learning. Within this approach, we discover that the pre-trained RNA-FM could infer sequential and evolutionary information of non-coding RNAs without using any labels. Furthermore, we demonstrate RNA-FM's effectiveness by applying it to the downstream secondary/3D structure prediction, SARS-CoV-2 genome structure and evolution prediction, protein-RNA binding preference modeling, and gene expression regulation modeling. The comprehensive experiments show that the proposed method improves the RNA structural and functional modelling results significantly and consistently. Despite only being trained with unlabelled data, RNA-FM can serve as the foundational model for the field.

preprint2022arXiv

Knowledge-Grounded Dialogue Generation with a Unified Knowledge Representation

Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and heterogeneous knowledge sources. Existing systems perform poorly on unseen topics due to limited topics covered in the training data. In addition, heterogeneous knowledge sources make it challenging for systems to generalize to other tasks because knowledge sources in different knowledge representations require different knowledge encoders. To address these challenges, we present PLUG, a language model that homogenizes different knowledge sources to a unified knowledge representation for knowledge-grounded dialogue generation tasks. PLUG is pre-trained on a dialogue generation task conditioned on a unified essential knowledge representation. It can generalize to different downstream knowledge-grounded dialogue generation tasks with a few training examples. The empirical evaluation on two benchmarks shows that our model generalizes well across different knowledge-grounded tasks. It can achieve comparable performance with state-of-the-art methods under a fully-supervised setting and significantly outperforms other methods in zero-shot and few-shot settings.

preprint2022arXiv

MixDefense: A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

Machine learning with deep neural networks (DNNs) has become one of the foundation techniques in many safety-critical systems, such as autonomous vehicles and medical diagnosis systems. DNN-based systems, however, are known to be vulnerable to adversarial examples (AEs) that are maliciously perturbed variants of legitimate inputs. While there has been a vast body of research to defend against AE attacks in the literature, the performances of existing defense techniques are still far from satisfactory, especially for adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this work, we propose a multilayer defense-in-depth framework for AE detection, namely MixDefense. For the first layer, we focus on those AEs with large perturbations. We propose to leverage the `noise' features extracted from the inputs to discover the statistical difference between natural images and tampered ones for AE detection. For AEs with small perturbations, the inference result of such inputs would largely deviate from their semantic information. Consequently, we propose a novel learning-based solution to model such contradictions for AE detection. Both layers are resilient to adaptive attacks because there do not exist gradient propagation paths for AE generation. Experimental results with various AE attack methods on image classification datasets show that the proposed MixDefense solution outperforms the existing AE detection techniques by a considerable margin.

preprint2022arXiv

Multipole-fluctuation pairing mechanism of $d_{x^2-y^2}+ig$ superconductivity in Sr$_2$RuO$_4$

Despite of many experimental and theoretical efforts, the pairing symmetry of superconductivity in Sr$_2$RuO$_4$ remains undecided. The accidentally degenerate $d_{x^2-y^2}+ig$ is consistent with most current experiments and seems to be one of the most probable candidates, but we still lack a satisfactory theoretical mechanism for its appearance. Here we construct a phenomenological model combining realistic electronic band structures and all symmetry-allowed multipole fluctuations as potential pairing glues, and make a systematic survey of major pairing states within the Eliashberg framework. Our calculations show that $d_{x^2-y^2}+ig$ can arise naturally from the interplay of antiferromagnetic, ferromagnetic, and electric multipole fluctuations whose coexistence is manifested in previous experiments and calculations. Our work provides a physically reasonable basis supporting the possibility of $d_{x^2-y^2}+ig$ pairing in superconducting Sr$_2$RuO$_4$.

preprint2022arXiv

Neutron Scattering Study of Fluctuating and Static Spin Correlations in the Anisotropic Spin Glass Fe$_2$TiO$_5$

The anisotropic spin glass transition, in which spin freezing is observed only along the c-axis in pseudobrookite Fe$_2$TiO$_5$, has long been perplexing because the Fe$^{3+}$ moments (d$^5$) are expected to be isotropic. Recently, neutron diffraction demonstrated that surfboard-shaped antiferromagnetic nanoregions coalesce above the glass transition temperature, T$_g$ $\approx$ 55 K, and a model was proposed in which the freezing of the fluctuations of the surfboards' magnetization leads to the anisotropic spin glass state. Given this new model, we have carried out high resolution inelastic neutron scattering measurements of the spin-spin correlations to understand the temperature dependence of the intra-surfboard spin dynamics on neutron (picosecond) time-scales. Here, we report on the temperature-dependence of the spin fluctuations measured from single crystal Fe$_2$TiO$_5$. Strong quasi-elastic magnetic scattering, arising from intra-surfboard correlations, is observed well above T$_g$. The spin fluctuations possess a steep energy-wave vector relation and are indicative of strong exchange interactions, consistent with the large Curie-Weiss temperature. As the temperature approaches T$_g$ from above, a shift in spectral weight from inelastic to elastic scattering is observed. At various temperatures between 4 K and 300 K, a characteristic relaxation rate of the fluctuations is determined. Despite the freezing of the majority of the spin correlations, an inelastic contribution remains even at base temperature, signifying the presence of fluctuating intra-surfboard spin correlations to at least T/T$_g$ $\approx$ 0.1 consistent with a description of Fe$_2$TiO$_5$ as a hybrid between conventional and geometrically frustrated spin glasses.

preprint2022arXiv

Parameter-robust Braess-Sarazin-type smoothers for linear elasticity problems

In this work, we propose three Braess-Sarazin-type multigrid relaxation schemes for solving linear elasticity problems, where the marker and cell scheme, a finite difference method, is used for the discretization. The three relaxation schemes are Jacobi-Braess-Sarazin, Mass-Braess-Sarazin, and Vanka-Braess-Sarazin. A local Fourier analysis (LFA) for the block-structured relaxation schemes is presented to study multigrid convergence behavior. From LFA, we derive optimal LFA smoothing factor for each case. We obtain highly efficient smoothing factors, which are independent of Lamé constants. Vanka-Braess-Sarazin relaxation scheme leads to the most efficient one. In each relaxation, a Schur complement system needs to be solved. Due to the fact that direct solve is often expensive, an inexact version is developed, where we simply use at most three weighted Jacobi iterations on the Schur complement system. Finally, two-grid and V-cycle multigrid performances are presented to validate our theoretical results. Our numerical results show that inexact versions can achieve the same performance as that of exact versions and our methods are robust to the Lamé constants.

preprint2022arXiv

Rethinking Knowledge Distillation via Cross-Entropy

Knowledge Distillation (KD) has developed extensively and boosted various tasks. The classical KD method adds the KD loss to the original cross-entropy (CE) loss. We try to decompose the KD loss to explore its relation with the CE loss. Surprisingly, we find it can be regarded as a combination of the CE loss and an extra loss which has the identical form as the CE loss. However, we notice the extra loss forces the student's relative probability to learn the teacher's absolute probability. Moreover, the sum of the two probabilities is different, making it hard to optimize. To address this issue, we revise the formulation and propose a distributed loss. In addition, we utilize teachers' target output as the soft target, proposing the soft loss. Combining the soft loss and the distributed loss, we propose a new KD loss (NKD). Furthermore, we smooth students' target output to treat it as the soft target for training without teachers and propose a teacher-free new KD loss (tf-NKD). Our method achieves state-of-the-art performance on CIFAR-100 and ImageNet. For example, with ResNet-34 as the teacher, we boost the ImageNet Top-1 accuracy of ResNet18 from 69.90% to 71.96%. In training without teachers, MobileNet, ResNet-18 and SwinTransformer-Tiny achieve 70.04%, 70.76%, and 81.48%, which are 0.83%, 0.86%, and 0.30% higher than the baseline, respectively. The code is available at https://github.com/yzd-v/cls_KD.

preprint2022arXiv

Semantic Guided Single Image Reflection Removal

Reflection is common in images capturing scenes behind a glass window, which is not only a disturbance visually but also influence the performance of other computer vision algorithms. Single image reflection removal is an ill-posed problem because the color at each pixel needs to be separated into two values, i.e., the desired clear background and the reflection. To solve it, existing methods propose priors such as smoothness, color consistency. However, the low-level priors are not reliable in complex scenes, for instance, when capturing a real outdoor scene through a window, both the foreground and background contain both smooth and sharp area and a variety of color. In this paper, inspired by the fact that human can separate the two layers easily by recognizing the objects, we use the object semantic as guidance to force the same semantic object belong to the same layer. Extensive experiments on different datasets show that adding the semantic information offers a significant improvement to reflection separation. We also demonstrate the applications of the proposed method to other computer vision tasks.

preprint2022arXiv

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS). Different from previous transformer-based VIS methods, TeViT is nearly convolution-free, which contains a transformer backbone and a query-based video instance segmentation head. In the backbone stage, we propose a nearly parameter-free messenger shift mechanism for early temporal context fusion. In the head stages, we propose a parameter-shared spatiotemporal query interaction mechanism to build the one-to-one correspondence between video instances and queries. Thus, TeViT fully utilizes both framelevel and instance-level temporal context information and obtains strong temporal modeling capacity with negligible extra computational cost. On three widely adopted VIS benchmarks, i.e., YouTube-VIS-2019, YouTube-VIS-2021, and OVIS, TeViT obtains state-of-the-art results and maintains high inference speed, e.g., 46.6 AP with 68.9 FPS on YouTube-VIS-2019. Code is available at https://github.com/hustvl/TeViT.

preprint2022arXiv

Towards explainable artificial intelligence (XAI) for early anticipation of traffic accidents

Traffic accident anticipation is a vital function of Automated Driving Systems (ADSs) for providing a safety-guaranteed driving experience. An accident anticipation model aims to predict accidents promptly and accurately before they occur. Existing Artificial Intelligence (AI) models of accident anticipation lack a human-interpretable explanation of their decision-making. Although these models perform well, they remain a black-box to the ADS users, thus difficult to get their trust. To this end, this paper presents a Gated Recurrent Unit (GRU) network that learns spatio-temporal relational features for the early anticipation of traffic accidents from dashcam video data. A post-hoc attention mechanism named Grad-CAM is integrated into the network to generate saliency maps as the visual explanation of the accident anticipation decision. An eye tracker captures human eye fixation points for generating human attention maps. The explainability of network-generated saliency maps is evaluated in comparison to human attention maps. Qualitative and quantitative results on a public crash dataset confirm that the proposed explainable network can anticipate an accident on average 4.57 seconds before it occurs, with 94.02% average precision. In further, various post-hoc attention-based XAI methods are evaluated and compared. It confirms that the Grad-CAM chosen by this study can generate high-quality, human-interpretable saliency maps (with 1.23 Normalized Scanpath Saliency) for explaining the crash anticipation decision. Importantly, results confirm that the proposed AI model, with a human-inspired design, can outperform humans in the accident anticipation.

preprint2022arXiv

Towards Vivid and Diverse Image Colorization with Generative Color Prior

Colorization has attracted increasing interest in recent years. Classic reference-based methods usually rely on external color images for plausible results. A large image database or online search engine is inevitably required for retrieving such exemplars. Recent deep-learning-based methods could automatically colorize images at a low cost. However, unsatisfactory artifacts and incoherent colors are always accompanied. In this work, we propose GCP-Colorization that leverages the rich and diverse color priors encapsulated in a pretrained Generative Adversarial Networks (GAN) for automatic colorization. Specifically, we first "retrieve" matched features (similar to exemplars) via a GAN encoder and then incorporate these features into the colorization process with feature modulations. Thanks to the powerful generative color prior (GCP) and delicate designs, our GCP-Colorization could produce vivid colors with a single forward pass. Moreover, it is highly convenient to obtain diverse results by modifying GAN latent codes. GCP-Colorization also inherits the merit of interpretable controls of GANs and could attain controllable and smooth transitions by walking through GAN latent space. Extensive experiments and user studies demonstrate that GCP-Colorization achieves superior performance than previous works. Codes are available at https://github.com/ToTheBeginning/GCP-Colorization.

preprint2022arXiv

Unseasonal super ionospheric plasma bubble and scintillations seeded by the 2022 Tonga Volcano Eruption related perturbations

The Hunga-Tonga volcano eruption at 04:14:45 UT on 15 January 2022 produced various waves propagating globally, disturbing the background atmosphere and ionosphere. Coinciding with the arrival of perturbation waves, several equatorial plasma bubbles (EPBs) were consecutively generated at post-sunset hours over the East/Southeast Asian region, with the largest extension to middle latitudes. These EPBs caused intense L-band amplitude scintillations at middle-to-low latitudes, with signal fading depths up to ~16 dB. Considering the very rare occurrence of EPBs during this season in East/Southeast Asian sector and the significantly modulated background ionosphere, we believe that the perturbation waves launched by the volcano eruption triggered the generation of unseasonal super EPBs. The ionospheric perturbations linked with the 2022 Tonga volcano eruption propagated coincidently through the East/Southeast Asia longitude sector near sunset, modulated the equatorial F region bottomside plasma density and acted as the seeding source for the generation of unseasonal super bubbles. Our results implicate that volcano eruption could indirectly affect the satellite communication links in the region more than ten thousand kilometers away.

preprint2022arXiv

Using Chatbots to Teach Languages

This paper reports on progress towards building an online language learning tool to provide learners with conversational experience by using dialog systems as conversation practice partners. Our system can adapt to users' language proficiency on the fly. We also provide automatic grammar error feedback to help users learn from their mistakes. According to our first adopters, our system is entertaining and useful. Furthermore, we will provide the learning technology community a large-scale conversation dataset on language learning and grammar correction. Our next step is to make our system more adaptive to user profile information by using reinforcement learning algorithms.

preprint2022arXiv

What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference result to a generator to obtain a synthetic output and then comparing it against the original input. For legitimate inputs that are correctly inferred, the synthetic output tries to reconstruct the input. On the contrary, for AEs, instead of reconstructing the input, the synthetic output would be created to conform to the wrong label whenever possible. Consequently, by measuring the distance between the input and the synthetic output with metric learning, we can differentiate AEs from legitimate inputs. We perform comprehensive evaluations under various AE attack scenarios, and experimental results show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks. Moreover, our analysis shows that successful AEs that can bypass ContraNet tend to have much-weakened adversarial semantics. We have also shown that ContraNet can be easily combined with adversarial training techniques to achieve further improved AE defense capabilities.

preprint2021arXiv

CLiMP: A Benchmark for Chinese Language Model Evaluation

Linguistically informed analyses of language models (LMs) contribute to the understanding and improvement of these models. Here, we introduce the corpus of Chinese linguistic minimal pairs (CLiMP), which can be used to investigate what knowledge Chinese LMs acquire. CLiMP consists of sets of 1,000 minimal pairs (MPs) for 16 syntactic contrasts in Mandarin, covering 9 major Mandarin linguistic phenomena. The MPs are semi-automatically generated, and human agreement with the labels in CLiMP is 95.8%. We evaluated 11 different LMs on CLiMP, covering n-grams, LSTMs, and Chinese BERT. We find that classifier-noun agreement and verb complement selection are the phenomena that models generally perform best at. However, models struggle the most with the ba construction, binding, and filler-gap dependencies. Overall, Chinese BERT achieves an 81.8% average accuracy, while the performances of LSTMs and 5-grams are only moderately above chance level.

preprint2021arXiv

Orbital-Selective High-Temperature Cooper Pairing Developed in the Two-Dimensional Limit

The orbital multiplicity in multiband superconductors yields orbital differentiation in normal-state properties, and can lead to orbital-selective spin-fluctuation Cooper pairing. This phenomenon has become increasingly pivotal in clarifying the pairing 'enigma' particularly for multiband high-temperature superconductors. In one-unit-cell (1-UC) FeSe/SrTiO3, the thinnest and highest-Tc member of iron-based superconductors, the standard electron-hole Fermi pocket nesting scenario is apparently not applicable since the Gamma-centered hole pockets are absent, so the actual pairing mechanism is the subject of intense debate. Here, by measuring high-resolution Bogoliubov quasiparticle interference, we report observations of highly anisotropic magnetic Cooper pairing in 1-UC FeSe. From a theoretical point of view, it is important to incorporate effects of electronic correlations within a spin-fluctuation pairing calculation, where the dxy orbital becomes coherence-suppressed. The resulting pairing gap is compatible with the experimental findings, which suggests that high-Tc Cooper pairing with orbital selectivity applies to 1-UC FeSe. Our findings imply the general existence of orbital selectivity in iron-based superconductors and the universal importance of electron correlations in high-Tc superconductors.

preprint2021arXiv

Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance

Removing undesired reflection from an image captured through a glass surface is a very challenging problem with many practical application scenarios. For improving reflection removal, cascaded deep models have been usually adopted to estimate the transmission in a progressive manner. However, most existing methods are still limited in exploiting the result in prior stage for guiding transmission estimation. In this paper, we present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR). To be specific, the reflection layer is firstly estimated due to that it generally is much simpler and is relatively easier to estimate. Reflectionaware guidance (RAG) module is then elaborated for better exploiting the estimated reflection in predicting transmission layer. By incorporating feature maps from the estimated reflection and observation, RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. A dedicated mask loss is further presented for reconciling the contributions of encoder and decoder features. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods. The source code and pre-trained model are available at https://github.com/liyucs/RAGNet.

preprint2020arXiv

A new generalized inverse of matrices from core-EP decomposition

A new generalized inverse for a square matrix $H\in\mathbb{C}^{n\times n}$, called CCE-inverse, is established by the core-EP decomposition and Moore-Penrose inverse $H^†$. We propose some characterizations of the CCE-inverse. Furthermore, two canonical forms of the CCE-inverse are presented. At last, we introduce the definitions of CCE-matrices and $k$-CCE matrices, and prove that CCE-matrices are the same as $i$-EP matrices studied by Wang and Liu in [The weak group matrix, Aequationes Mathematicae, 93(6): 1261-1273, 2019].

preprint2020arXiv

Ancient solutions to the Ricci flow with isotropic curvature conditions

We show that every $n$-dimensional, $κ$-noncollapsed, noncompact, complete ancient solution to the Ricci flow with uniformly PIC for $n=4$ or $n\ge 12$ has weakly PIC$_2$ and bounded curvature. Combining this with earlier results, we prove that any such solution is isometric to either a family of shrinking cylinders (or a quotient thereof) or the Bryant soliton. Also, we classify all complex 2-dimensional, $κ$-noncollapsed, complete ancient solutions to the Kähler Ricci flow with weakly PIC.

preprint2020arXiv

Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset

Low-light image enhancement is challenging in that it needs to consider not only brightness recovery but also complex issues like color distortion and noise, which usually hide in the dark. Simply adjusting the brightness of a low-light image will inevitably amplify those artifacts. To address this difficult problem, this paper proposes a novel end-to-end attention-guided method based on multi-branch convolutional neural network. To this end, we first construct a synthetic dataset with carefully designed low-light simulation strategies. The dataset is much larger and more diverse than existing ones. With the new dataset for training, our method learns two attention maps to guide the brightness enhancement and denoising tasks respectively. The first attention map distinguishes underexposed regions from well lit regions, and the second attention map distinguishes noises from real textures. With their guidance, the proposed multi-branch decomposition-and-fusion enhancement network works in an input adaptive way. Moreover, a reinforcement-net further enhances color and contrast of the output image. Extensive experiments on multiple datasets demonstrate that our method can produce high fidelity enhancement results for low-light images and outperforms the current state-of-the-art methods by a large margin both quantitatively and visually.

preprint2020arXiv

Classification Calibration for Long-tail Instance Segmentation

Remarkable progress has been made in object instance detection and segmentation in recent years. However, existing state-of-the-art methods are mostly evaluated with fairly balanced and class-limited benchmarks, such as Microsoft COCO dataset [8]. In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals. Based on this observation, we propose to calibrate the prediction of classification head to improve recognition performance for the tail classes. Without much additional cost and modification of the detection model architecture, our calibration method improves the performance of the baseline by a large margin on the tail classes. Codes will be available. Importantly, after the submission, we find significant improvement can be further achieved by modifying the calibration head, which we will update later.

preprint2020arXiv

Dual Semantic Fusion Network for Video Object Detection

Video object detection is a tough task due to the deteriorated quality of video sequences captured under complex environments. Currently, this area is dominated by a series of feature enhancement based methods, which distill beneficial semantic information from multiple frames and generate enhanced features through fusing the distilled information. However, the distillation and fusion operations are usually performed at either frame level or instance level with external guidance using additional information, such as optical flow and feature memory. In this work, we propose a dual semantic fusion network (abbreviated as DSFNet) to fully exploit both frame-level and instance-level semantics in a unified fusion framework without external guidance. Moreover, we introduce a geometric similarity measure into the fusion process to alleviate the influence of information distortion caused by noise. As a result, the proposed DSFNet can generate more robust features through the multi-granularity fusion and avoid being affected by the instability of external guidance. To evaluate the proposed DSFNet, we conduct extensive experiments on the ImageNet VID dataset. Notably, the proposed dual semantic fusion network achieves, to the best of our knowledge, the best performance of 84.1\% mAP among the current state-of-the-art video object detectors with ResNet-101 and 85.4\% mAP with ResNeXt-101 without using any post-processing steps.

preprint2020arXiv

Fast Video Object Segmentation using the Global Context Module

We developed a real-time, high-quality semi-supervised video object segmentation algorithm. Its accuracy is on par with the most accurate, time-consuming online-learning model, while its speed is similar to the fastest template-matching method with sub-optimal accuracy. The core component of the model is a novel global context module that effectively summarizes and propagates information through the entire video. Compared to previous approaches that only use one frame or a few frames to guide the segmentation of the current frame, the global context module uses all past frames. Unlike the previous state-of-the-art space-time memory network that caches a memory at each spatio-temporal position, the global context module uses a fixed-size feature representation. Therefore, it uses constant memory regardless of the video length and costs substantially less memory and computation. With the novel module, our model achieves top performance on standard benchmarks at a real-time speed.

preprint2020arXiv

Hyperspectral City V1.0 Dataset and Benchmark

This document introduces the background and the usage of the Hyperspectral City Dataset and the benchmark. The documentation first starts with the background and motivation of the dataset. Follow it, we briefly describe the method of collecting the dataset and the processing method from raw dataset to the final release dataset, specifically, the version 1.0. We also provide the detailed usage of the dataset and the evaluation metric for submitted the result for the 2019 Hyperspectral City Challenge.

preprint2020arXiv

Learning to Stop While Learning to Predict

There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a ``fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances, either to avoid ``over-thinking'', or because we want to compute less for operations converged already. In this paper, we tackle this varying depth problem using a steerable architecture, where a feed-forward deep model and a variational stopping policy are learned together to sequentially determine the optimal number of layers for each input instance. Training such architecture is very challenging. We provide a variational Bayes perspective and design a novel and effective training procedure which decomposes the task into an oracle model learning stage and an imitation stage. Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.

preprint2020arXiv

On the regular-convexity of Ricci shrinker limit spaces

In this paper, we study the structure of the pointed-Gromov-Hausdorff limits of sequences of Ricci shrinkers. We define a regular-singular decomposition following the work of Cheeger-Colding for manifolds with a uniform Ricci curvature lower bound, and prove that the regular part of any Ricci shrinker limit space is convex, inspired by Colding-Naber's original idea of parabolic smoothing of the distance functions.

preprint2020arXiv

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored.In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. We find existing detection methods are unable to model few-shot classes when the dataset is extremely skewed, which can result in classifier imbalance in terms of parameter magnitude. Directly adapting long-tail classification models to detection frameworks can not solve this problem due to the intrinsic difference between detection and classification.In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. It implicitly modulates the training process for the head and tail classes and ensures they are both sufficiently trained, without requiring any extra sampling for the instances from the tail classes.Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors with various backbones and frameworks on both object detection and instance segmentation. It beats all state-of-the-art methods transferred from long-tail image classification and establishes new state-of-the-art.Code is available at https://github.com/FishYuLi/BalancedGroupSoftmax.

preprint2020arXiv

RNA Secondary Structure Prediction By Learning Unrolled Algorithms

In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to previous SOTA (especially for pseudoknotted structures), while being as efficient as the fastest algorithms in terms of inference time.

preprint2020arXiv

Topological Hall effect and the magnetic states of Nowotney chimney ladder compound Cr$_{11}$Ge$_{19}$}

We have investigated the magnetic and charge transport properties of single crystals of Nowotney Chimney Ladder compound Cr$_{11}$Ge$_{19}$ and mapped out a comprehensive phase diagram reflecting the complicated interplay between the Dzyaloshinskii-Moriya (DM) interaction, the dipolar interaction, and the magnetic anisotropy. We have identified a set of interesting magnetic phases and attributed a finite topological Hall effect to the recently discovered bi-skyrmion phase. These data also suggest the existence of an anti-skyrmion state at finite fields for temperatures just below the magnetic ordering temperature, $T_c$, as indicated by a distinct change in sign of the topological Hall effect. Above $T_c$, we discovered a region of enhanced magnetic response corresponding to a disordered phase likely existing near the ferromagnetic critical point under small magnetic fields. Strong spin chirality fluctuations are demonstrated by the large value of the topological Hall resistivity persisting up to 1 T which is most likely due to the existence of the DM interaction. We argue that changes to the topological Hall effect correspond to different topological spin textures that are controlled by magnetic dipolar and DM interactions that vary in importance with temperature.

preprint2020arXiv

Towards Structured Prediction in Bioinformatics with Deep Learning

Using machine learning, especially deep learning, to facilitate biological research is a fascinating research direction. However, in addition to the standard classification or regression problems, in bioinformatics, we often need to predict more complex structured targets, such as 2D images and 3D molecular structures. The above complex prediction tasks are referred to as structured prediction. Structured prediction is more complicated than the traditional classification but has much broader applications, considering that most of the original bioinformatics problems have complex output objects. Due to the properties of those structured prediction problems, such as having problem-specific constraints and dependency within the labeling space, the straightforward application of existing deep learning models can lead to unsatisfactory results. Here, we argue that the following ideas can help resolve structured prediction problems in bioinformatics. Firstly, we can combine deep learning with other classic algorithms, such as probabilistic graphical models, which model the problem structure explicitly. Secondly, we can design the problem-specific deep learning architectures or methods by considering the structured labeling space and problem constraints, either explicitly or implicitly. We demonstrate our ideas with six projects from four bioinformatics subfields, including sequencing analysis, structure prediction, function annotation, and network analysis. The structured outputs cover 1D signals, 2D images, 3D structures, hierarchical labeling, and heterogeneous networks. With the help of the above ideas, all of our methods can achieve SOTA performance on the corresponding problems. The success of these projects motivates us to extend our work towards other more challenging but important problems, such as health-care problems, which can directly benefit people's health and wellness.

preprint2020arXiv

Transient Grating Spectroscopy of Photocarrier Dynamics in Semiconducting Polymer Thin Films

While charge carrier dynamics and thermal management are both keys to the operational efficiency and stability for energy-related devices, experimental techniques that can simultaneously characterize both properties are still lacking. In this paper, we use laser-induced transient grating (TG) spectroscopy to characterize thin films of the archetypal organic semiconductor regioregular poly(3-hexylthiophene) (P3HT) and its blends with the electron acceptor [6,6]-phenyl-C61-butyric acid methyl ester (PCBM) on glass substrates. While the thermal response is determined to be dominated by the substrates, we show that the recombination dynamics of photocarriers in the organic semiconductor thin films occur on a similar timescale and can be separated from the thermal response. Our measurements indicate that the photocarrier dynamics are determined by multiple recombination processes and our extracted recombination rates are in good agreement with previous reports using other techniques. We further apply TG spectroscopy to characterize another conjugated polymer and a molecular fluorescent material to demonstrate its general applicability. Our study indicates the potential of transient grating spectroscopy to simultaneously characterize thermal transport and photocarrier dynamics in organic optoelectronic devices.

preprint2020arXiv

Unsupervised Learning for Intrinsic Image Decomposition from a Single Image

Intrinsic image decomposition, which is an essential task in computer vision, aims to infer the reflectance and shading of the scene. It is challenging since it needs to separate one image into two components. To tackle this, conventional methods introduce various priors to constrain the solution, yet with limited performance. Meanwhile, the problem is typically solved by supervised learning methods, which is actually not an ideal solution since obtaining ground truth reflectance and shading for massive general natural scenes is challenging and even impossible. In this paper, we propose a novel unsupervised intrinsic image decomposition framework, which relies on neither labeled training data nor hand-crafted priors. Instead, it directly learns the latent feature of reflectance and shading from unsupervised and uncorrelated data. To enable this, we explore the independence between reflectance and shading, the domain invariant content constraint and the physical constraint. Extensive experiments on both synthetic and real image datasets demonstrate consistently superior performance of the proposed method.

preprint2019arXiv

Spin fluctuation anisotropy as a probe of orbital-selective hole-electron quasiparticle excitations in detwinned Ba(Fe1-xCox)2As2

We use inelastic neutron scattering to study spin excitation anisotropy in mechanically detwinned Ba(Fe1-xCox)2As2 with x = 0.048 and 0.054. Both samples exhibit a tetragonal-to-orthorhombic structural transition at Ts, a collinear static antiferromagnetic (AF) order at wave vector Q1 = QAF = (1, 0) below the Neel temperature TN, and superconductivity below Tc (Ts > TN > Tc). In the high temperature paramagnetic tetragonal phase (T > Ts), spin excitations centered at Q1 and Q2 = (0, 1) are gapless and have four-fold (C4) rotational symmetry. On cooling to below TN but above Tc, spin excitations become highly anisotropic, developing a gap at Q2 but still are gapless at Q1. Upon entering into the superconducting state, a neutron spin resonance appears at Q1 with no magnetic scattering at Q2. By comparing these results with those from angle resolved photoemission spectroscopy experiments, we conclude that the anisotropic shift of the dyz and dxz bands in detwinned Ba(Fe1-xCox)2As2 below Ts is associated with the spin excitation anisotropy, and the superconductivity-induced resonance arises from the electron-hole Fermi surface nesting of quasiparticles with the dyz orbital characters.

preprint2019arXiv

Tuneable terahertz oscillation arising from Bloch-point dynamics in chiral magnets

Skyrmionic textures are being extensively investigated due to the occurrence of novel topological magnetic phenomena and their promising applications in a new generation of spintronic devices that take advantage of the robust topological stability of their spin structures. The development of practical devices relies on a detailed understanding of how skyrmionic structures can be formed, transferred, detected and annihilated. In this work, our considerations go beyond static skyrmions and theoretically show that the formation/annihilation of both skyrmions and antiskyrmions is enabled by the transient creation and propagation of topological singularities (magnetic monopole-like Bloch points). Critically, during the winding/unwinding of skyrmionic textures, our results predict that the Bloch-point propagation will give rise to an emergent electric field in a terahertz frequency range and with substantial amplitude. We also demonstrate ways for controlling Bloch-point dynamics, which directly enable the tuneability on both frequency and amplitude of this signal. Our studies provide a concept of directly exploiting topological singularities for terahertz skyrmion-based electronic devices.

preprint2018arXiv

c-axis pressure induced antiferromagnetic order in optimally P-doped BaFe2(As0.70P0.30)2 superconductor

Superconductivity in BaFe2(As1-xPx)2 iron pnictides emerges when its in-plane two-dimensional (2D) orthorhombic lattice distortion associated with nematic phase at Ts and three-dimensional (3D) collinear antiferromagnetic (AF) order at TN (Ts = TN) are gradually suppressed with increasing x, reaching optimal superconductivity around x = 0.30 with Tc $\approx$ 30 K. Here we show that a moderate uniaxial pressure along the c-axis in BaFe2(As0.70P0.30)2 spontaneously induces a 3D collinear AF order with TN = Ts > 30 K, while only slightly suppresses Tc. Although a ~ 400 MPa pressure compresses the c-axis lattice while expanding the in-plane lattice and increasing the nearest-neighbor Fe-Fe distance, it barely changes the average iron-pnictogen height in BaFe2(As0.70P0.30)2. Therefore, the pressure- induced AF order must arise from a strong in-plane magnetoelastic coupling, suggesting that the 2D nematic phase is a competing state with superconductivity.