Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

AdaMorph: Unified Motion Retargeting via Embodiment-Aware Adaptive Transformers

Retargeting human motion to heterogeneous robots is a fundamental challenge in robotics, primarily due to the severe kinematic and dynamic discrepancies between varying embodiments. Existing solutions typically resort to training embodiment-specific models, which scales poorly and fails to exploit shared motion semantics. To address this, we present AdaMorph, a unified neural retargeting framework that enables a single model to adapt human motion to diverse robot morphologies. Our approach treats retargeting as a conditional generation task. We map human motion into a morphology-agnostic latent intent space and utilize a dual-purpose prompting mechanism to condition the generation. Instead of simple input concatenation, we leverage Adaptive Layer Normalization (AdaLN) to dynamically modulate the decoder's feature space based on embodiment constraints. Furthermore, we enforce physical plausibility through a curriculum-based training objective that ensures orientation and trajectory consistency via integration. Experimental results on 12 distinct humanoid robots demonstrate that AdaMorph effectively unifies control across heterogeneous topologies, exhibiting strong zero-shot generalization to unseen complex motions while preserving the dynamic essence of the source behaviors.

preprint2026arXiv

Structure-Centric Graph Foundation Model via Geometric Bases

Graph foundation models (GFMs) seek transferable representations across graph domains but are limited by structural heterogeneity and incompatible node feature spaces. We propose Structure-Centric Graph Foundation Models (SCGFM), which treat graph topology as the primary source of transferable knowledge. Modeling graphs as metric measure spaces, SCGFM introduces learnable geometric bases that define a shared structural coordinate system. Graphs are aligned to these bases via Gromov-Wasserstein distances, yielding structure-aligned latent representations that accommodate heterogeneous graph topologies. To address feature incompatibility, SCGFM employs a structure-aware feature re-encoding mechanism that unifies node representations without assuming a fixed feature dimensionality or requiring dataset-specific preprocessing. Experiments on graph- and node-level tasks demonstrate strong in-domain and cross-domain generalization, outperforming existing GFM approaches.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question Generation from SPARQL

This study investigates the task of knowledge-based question generation (KBQG). Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL. Moreover, due to the costly annotation of large-scale SPARQL-question pairs, KBQG from SPARQL under low-resource scenarios urgently needs to be explored. Recently, since the generative pre-trained language models (PLMs) typically trained in natural language (NL)-to-NL paradigm have been proven effective for low-resource generation, e.g., T5 and BART, how to effectively utilize them to generate NL-question from non-NL SPARQL is challenging. To address these challenges, AutoQGS, an auto-prompt approach for low-resource KBQG from SPARQL, is proposed. Firstly, we put forward to generate questions directly from SPARQL for the KBQG task to handle complex operations. Secondly, we propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description, smoothing the low-resource transformation from non-NL SPARQL to NL question with PLMs. Experimental results on the WebQuestionsSP, ComlexWebQuestions 1.1, and PathQuestions show that our model achieves state-of-the-art performance, especially in low-resource settings. Furthermore, a corpus of 330k factoid complex question-SPARQL pairs is generated for further KBQG research.

preprint2022arXiv

BORT: Back and Denoising Reconstruction for End-to-End Task-Oriented Dialog

A typical end-to-end task-oriented dialog system transfers context into dialog state, and upon which generates a response, which usually faces the problem of error propagation from both previously generated inaccurate dialog states and responses, especially in low-resource scenarios. To alleviate these issues, we propose BORT, a back and denoising reconstruction approach for end-to-end task-oriented dialog system. Squarely, to improve the accuracy of dialog states, back reconstruction is used to reconstruct the original input context from the generated dialog states since inaccurate dialog states cannot recover the corresponding input context. To enhance the denoising capability of the model to reduce the impact of error propagation, denoising reconstruction is used to reconstruct the corrupted dialog state and response. Extensive experiments conducted on MultiWOZ 2.0 and CamRest676 show the effectiveness of BORT. Furthermore, BORT demonstrates its advanced capabilities in the zero-shot domain and low-resource scenarios.

preprint2022arXiv

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust SLU techniques.

preprint2022arXiv

Conversational AI Systems for Social Good: Opportunities and Challenges

Conversational artificial intelligence (ConvAI) systems have attracted much academic and commercial attention recently, making significant progress on both fronts. However, little existing work discusses how these systems can be developed and deployed for social good in real-world applications, with comprehensive case studies and analyses of pros and cons. In this paper, we briefly review the progress the community has made towards better ConvAI systems and reflect on how existing technologies can help advance social good initiatives from various angles that are unique for ConvAI, or not yet become common knowledge in the community. We further discuss about the challenges ahead for ConvAI systems to better help us achieve these goals and highlight the risks involved in their development and deployment in the real world.

preprint2022arXiv

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation. Unlike previous anticipation tasks that aim at action label prediction, our work targets at generating natural language outputs that provide interpretable and accurate descriptions of future action steps. It is a challenging task due to the lack of semantic information extracted from the instructional videos. To overcome this challenge, we propose a novel knowledge distillation framework to exploit the related external textual knowledge to assist the visual anticipation task. However, previous knowledge distillation techniques generally transfer information within the same modality. To bridge the gap between the visual and text modalities during the distillation process, we devise a novel cross-modal contrastive distillation (CCD) scheme, which facilitates knowledge distillation between teacher and student in heterogeneous modalities with the proposed cross-modal distillation loss. We evaluate our method on the Tasty Videos dataset. CCD improves the anticipation performance of the visual-alone student model by a large margin of 40.2% relatively in BLEU4. Our approach also outperforms the state-of-the-art approaches by a large margin.

preprint2022arXiv

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy (CE) loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence is not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address the challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL is designed to be robust to various noises and edits in the target sequences. Moreover, the EISL computation is essentially an approximate convolution operation with target n-grams as kernels, which is easy to implement and efficient to compute with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on a wide range of tasks, including machine translation with noisy target sequences, unsupervised text style transfer with only weak training signals, and non-autoregressive generation with non-predefined generation order. Experimental results show our method significantly outperforms the common CE loss and other strong baselines on all the tasks. EISL has a simple API that can be used as a drop-in replacement of the CE loss: https://github.com/guangyliu/EISL.

preprint2022arXiv

Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT

Transformer-based pre-trained models, such as BERT, have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, deploying these models can be prohibitively costly, as the standard self-attention mechanism of the Transformer suffers from quadratic computational cost in the input sequence length. To confront this, we propose FCA, a fine- and coarse-granularity hybrid self-attention that reduces the computation cost through progressively shortening the computational sequence length in self-attention. Specifically, FCA conducts an attention-based scoring strategy to determine the informativeness of tokens at each layer. Then, the informative tokens serve as the fine-granularity computing units in self-attention and the uninformative tokens are replaced with one or several clusters as the coarse-granularity computing units in self-attention. Experiments on GLUE and RACE datasets show that BERT with FCA achieves 2x reduction in FLOPs over original BERT with <1% loss in accuracy. We show that FCA offers a significantly better trade-off between accuracy and FLOPs compared to prior methods.

preprint2022arXiv

Flow Completion Network: Inferring the Fluid Dynamics from Incomplete Flow Information using Graph Neural Networks

This paper introduces a novel neural network - flow completion network (FCN) - to infer the fluid dynamics, includ-ing the flow field and the force acting on the body, from the incomplete data based on Graph Convolution AttentionNetwork. The FCN is composed of several graph convolution layers and spatial attention layers. It is designed to inferthe velocity field and the vortex force contribution of the flow field when combined with the vortex force map (VFM)method. Compared with other neural networks adopted in fluid dynamics, the FCN is capable of dealing with bothstructured data and unstructured data. The performance of the proposed FCN is assessed by the computational fluiddynamics (CFD) data on the flow field around a circular cylinder. The force coefficients predicted by our model arevalidated against those obtained directly from CFD. Moreover, it is shown that our model effectively utilizes the exist-ing flow field information and the gradient information simultaneously, giving a better performance than the traditionalconvolution neural network (CNN)-based and deep neural network (DNN)-based models. Specifically, among all thecases of different Reynolds numbers and different proportions of the training dataset, the results show that the proposedFCN achieves a maximum norm mean square error of 5.86% in the test dataset, which is much lower than those of thetraditional CNN-based and DNN-based models (42.32% and 15.63% respectively).

preprint2022arXiv

Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue

Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems. Previous studies indicate that multimodal cues can facilitate this challenging task. However, due to the paucity of public multimodal datasets, current methods are mostly limited to either utilizing unimodal features or simplistic multimodal ensemble models. Besides, the inherent class imbalance in real scenario, e.g. sentence ending with short pause will be mostly regarded as the end of turn, also poses great challenge to the turn-taking decision. In this paper, we first collect a large-scale annotated corpus for turn-taking with over 5,000 real human-robot dialogues in speech and text modalities. Then, a novel gated multimodal fusion mechanism is devised to utilize various information seamlessly for turn-taking prediction. More importantly, to tackle the data imbalance issue, we design a simple yet effective data augmentation method to construct negative instances without supervision and apply contrastive learning to obtain better feature representations. Extensive experiments are conducted and the results demonstrate the superiority and competitiveness of our model over several state-of-the-art baselines.

preprint2022arXiv

Label Anchored Contrastive Learning for Language Understanding

Contrastive learning (CL) has achieved astonishing progress in computer vision, speech, and natural language processing fields recently with self-supervised learning. However, CL approach to the supervised setting is not fully explored, especially for the natural language understanding classification task. Intuitively, the class label itself has the intrinsic ability to perform hard positive/negative mining, which is crucial for CL. Motivated by this, we propose a novel label anchored contrastive learning approach (denoted as LaCon) for language understanding. Specifically, three contrastive objectives are devised, including a multi-head instance-centered contrastive loss (ICL), a label-centered contrastive loss (LCL), and a label embedding regularizer (LER). Our approach does not require any specialized network architecture or any extra data augmentation, thus it can be easily plugged into existing powerful pre-trained language models. Compared to the state-of-the-art baselines, LaCon obtains up to 4.1% improvement on the popular datasets of GLUE and CLUE benchmarks. Besides, LaCon also demonstrates significant advantages under the few-shot and data imbalance settings, which obtains up to 9.4% improvement on the FewGLUE and FewCLUE benchmarking tasks.

preprint2022arXiv

LUNA: Learning Slot-Turn Alignment for Dialogue State Tracking

Dialogue state tracking (DST) aims to predict the current dialogue state given the dialogue history. Existing methods generally exploit the utterances of all dialogue turns to assign value for each slot. This could lead to suboptimal results due to the information introduced from irrelevant utterances in the dialogue history, which may be useless and can even cause confusion. To address this problem, we propose LUNA, a sLot-tUrN Alignment enhanced approach. It first explicitly aligns each slot with its most relevant utterance, then further predicts the corresponding value based on this aligned utterance instead of all dialogue utterances. Furthermore, we design a slot ranking auxiliary task to learn the temporal correlation among slots which could facilitate the alignment. Comprehensive experiments are conducted on multi-domain task-oriented dialogue datasets, i.e., MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2. The results show that LUNA achieves new state-of-the-art results on these datasets.

preprint2022arXiv

OPERA:Operation-Pivoted Discrete Reasoning over Text

Machine reading comprehension (MRC) that requires discrete reasoning involving symbolic operations, e.g., addition, sorting, and counting, is a challenging task. According to this nature, semantic parsing-based methods predict interpretable but complex logical forms. However, logical form generation is nontrivial and even a little perturbation in a logical form will lead to wrong answers. To alleviate this issue, multi-predictor -based methods are proposed to directly predict different types of answers and achieve improvements. However, they ignore the utilization of symbolic operations and encounter a lack of reasoning ability and interpretability. To inherit the advantages of these two types of methods, we propose OPERA, an operation-pivoted discrete reasoning framework, where lightweight symbolic operations (compared with logical forms) as neural modules are utilized to facilitate the reasoning ability and interpretability. Specifically, operations are first selected and then softly executed to simulate the answer reasoning procedure. Extensive experiments on both DROP and RACENum datasets show the reasoning ability of OPERA. Moreover, further analysis verifies its interpretability.

preprint2022arXiv

SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition

End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision. This could result in recognition errors due to similar-phoneme confusion or phoneme reduction. To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to enhance phonemic representation learning for end-to-end ASR systems. Specifically, we extend the self-supervised Masked Contrastive Predictive Coding (MCPC) to a fully-supervised setting, where the supervision is applied in the following way. First, SCaLa masks variable-length encoder features according to phoneme boundaries given phoneme forced-alignment extracted from a pre-trained acoustic model; it then predicts the masked features via contrastive learning. The forced-alignment can provide phoneme labels to mitigate the noise introduced by positive-negative pairs in self-supervised MCPC. Experiments on reading and spontaneous speech datasets show that our proposed approach achieves 2.8 and 1.4 points Character Error Rate (CER) absolute reductions compared to the baseline, respectively.

preprint2022arXiv

SE-GAN: Skeleton Enhanced GAN-based Model for Brush Handwriting Font Generation

Previous works on font generation mainly focus on the standard print fonts where character&#39;s shape is stable and strokes are clearly separated. There is rare research on brush handwriting font generation, which involves holistic structure changes and complex strokes transfer. To address this issue, we propose a novel GAN-based image translation model by integrating the skeleton information. We first extract the skeleton from training images, then design an image encoder and a skeleton encoder to extract corresponding features. A self-attentive refined attention module is devised to guide the model to learn distinctive features between different domains. A skeleton discriminator is involved to first synthesize the skeleton image from the generated image with a pre-trained generator, then to judge its realness to the target one. We also contribute a large-scale brush handwriting font image dataset with six styles and 15,000 high-resolution images. Both quantitative and qualitative experimental results demonstrate the competitiveness of our proposed model.

preprint2022arXiv

Simultaneous Position and Orientation Planning of Nonholonomic Multi-Robot Systems: A Dynamic Vector Field Approach

This paper considers the simultaneous position and orientation planning of nonholonomic multirobot systems. Different from common researches which only focus on final position constraints, we model the nonholonomic mobile robot as a rigid body and introduce the orientation as well as position constraints for the robot&#39;s final states. In other words, robots should not only reach the specified positions, but also point to the desired orientations simultaneously. The challenge of this problem lies in the underactuation of full-state motion planning, since three states need to be planned by mere two control inputs. To this end, we propose a dynamic vector field (DVF) based on the rigid body modeling. Specifically, the dynamics of the robot orientation are brought into the vector field, implying that the vector field is not static on the 2-D plane anymore, but a dynamic one varying with the attitude angle. Hence, each robot can move along the integral curve of the DVF to arrive at the desired position, and in the meantime, the attitude angle can converge to the specified value following the orientation dynamics. Subsequently, by designing a circular vector field under the framework of the DVF, we further study the obstacle avoidance and mutual-robot-collision avoidance in the motion planning. Finally, numerical simulation examples are provided to verify the effectiveness of the proposed methodology.

preprint2022arXiv

The low-entropy hydration shell at the binding site of spike RBD determines the contagiousness of SARS-CoV-2 variants

The infectivity of SARS-CoV-2 depends on the binding affinity of the receptor-binding domain (RBD) of the spike protein with the angiotensin converting enzyme 2 (ACE2) receptor. The calculated RBD-ACE2 binding energies indicate that the difference in transmission efficiency of SARS-CoV-2 variants cannot be fully explained by electrostatic interactions, hydrogen-bond interactions, van der Waals interactions, internal energy, and nonpolar solvation energies. Here, we demonstrate that low-entropy regions of hydration shells around proteins drive hydrophobic attraction between shape-matched low-entropy regions of the hydration shells, which essentially coordinates protein-protein binding in rotational-configurational space of mutual orientations and determines the binding affinity. An innovative method was used to identify the low-entropy regions of the hydration shells of the RBDs of multiple SARS-CoV-2 variants and the ACE2. We observed integral low-entropy regions of hydration shells covering the binding sites of the RBDs and matching in shape to the low-entropy region of hydration shell at the binding site of the ACE2. The RBD-ACE2 binding is thus found to be guided by hydrophobic collapse between the shape-matched low-entropy regions of the hydration shells. A measure of the low-entropy of the hydration shells can be obtained by counting the number of hydrophilic groups expressing hydrophilicity within the binding sites. The low-entropy level of hydration shells at the binding site of a spike protein is found to be an important indicator of the contagiousness of the coronavirus.

preprint2022arXiv

Ultra-fast synthesis and thermodynamic analysis of MoAlB by self-propagating high temperature combustion synthesis

MoAlB as a typical member of the MAB phases has attracted much growing attention due to its unique properties. However, the low production of MoAlB powders limits its further development and potential applications. To respond this challenge, the ultra-fast preparation of high-purity MoAlB powders is achieved in a few seconds by self-propagating high temperature combustion synthesis (SHS) using the raw powder mixture at the atomic ratio of 1Mo/1.3Al/1B. The reaction mechanism in SHS process was obtained by analyzing the corresponding composition changes of starting materials. Furthermore, the thermodynamic prediction for the SHS reaction of Mo + (1+y)Al + B = MoAlB + yAl is consistent with the present experiments, where the preparation of MoAlB also conforms to two common self-propagating conditions of SHS. Enthalpy vs temperature curve shows that the adiabatic temperature of the reaction decreases with increasing the amount of excuse Al, but increase when pre-heating the reactants. The thermodynamic calculation method also provides a new idea for the preparation of other MAB phases by SHS.

preprint2021arXiv

Conversational Query Rewriting with Self-supervised Learning

Context modeling plays a critical role in building multi-turn dialogue systems. Conversational Query Rewriting (CQR) aims to simplify the multi-turn dialogue modeling into a single-turn problem by explicitly rewriting the conversational query into a self-contained utterance. However, existing approaches rely on massive supervised training data, which is labor-intensive to annotate. And the detection of the omitted important information from context can be further improved. Besides, intent consistency constraint between contextual query and rewritten query is also ignored. To tackle these issues, we first propose to construct a large-scale CQR dataset automatically via self-supervised learning, which does not need human annotation. Then we introduce a novel CQR model Teresa based on Transformer, which is enhanced by self-attentive keywords detection and intent consistency constraint. Finally, we conduct extensive experiments on two public datasets. Experimental results demonstrate that our proposed model outperforms existing CQR baselines significantly, and also prove the effectiveness of self-supervised learning on improving the CQR performance.

preprint2021arXiv

Defect-free arbitrary-geometry assembly of mixed-species atom arrays

Optically trapped mixed-species single atom arrays with arbitrary geometries are an attractive and promising platform for various applications, because tunable quantum systems with multiple components provide extra degrees of freedom for experimental control. Here, we report the first demonstration of two-dimensional $6\times4$ dual-species atom assembly with a filling fraction of 0.88 (0.89) for $^{85}$Rb ($^{87}$Rb) atoms. This mixed-species atomic synthetic is achieved via rearranging initially randomly distributed atoms using a sorting algorithm (heuristic heteronuclear algorithm) which is proposed for bottom-up atom assembly with both user-defined geometries and two-species atom number ratios. Our fully tunable hybrid-atom system of scalable advantages is a good starting point for high-fidelity quantum logic, many-body quantum simulation and forming defect-free single molecule arrays.

preprint2021arXiv

High fidelity entanglement of neutral atoms via a Rydberg-mediated single-modulated-pulse controlled-PHASE gate

Neutral atom platform has become an attractive choice to study the science of quantum information and quantum simulation, where intense efforts have been devoted to the entangling processes between individual atoms. For the development of this area, two-qubit controlled-PHASE gate via Rydberg blockade is one of the most essential elements. Recent theoretical studies have suggested the advantages of introducing non-trivial waveform modulation into the gate protocol, which is anticipated to improve its performance towards the next stage. We report our recent experimental results in realizing a two-qubit controlled-PHASE($C_Z$) gate via off-resonant modulated driving(ORMD) embedded in two-photon transition for Rb atoms. It relies upon a single modulated driving pulse with a carefully calculated smooth waveform to gain the appropriate phase accumulations required by the two-qubit gate. Combining this $C_Z$ gate with global microwave pulses, two-atom entanglement is generated with the raw fidelity of 0.945(6). Accounting for state preparation and measurement (SPAM) errors, we extract the entanglement operation fidelity to be 0.980(7). Our work features completing the $C_Z$ gate operation within a single pulse to avoid shelved Rydberg population, thus demonstrate another promising route for realizing high-fidelity two-qubit gate for neutral atom platform.

preprint2021arXiv

Hydrophobic interaction determines docking affinity of SARS CoV 2 variants with antibodies

Preliminary epidemiologic, phylogenetic and clinical findings suggest that several novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have increased transmissibility and decreased efficacy of several existing vaccines. Four mutations in the receptor-binding domain (RBD) of the spike protein that are reported to contribute to increased transmission. Understanding physical mechanism responsible for the affinity enhancement between the SARS-CoV-2 variants and ACE2 is the &#34;urgent challenge&#34; for developing blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Based on a hydrophobic-interaction-based protein docking mechanism, this study reveals that the mutation N501Y obviously increased the hydrophobic attraction and decrease hydrophilic repulsion between the RBD and ACE2 that most likely caused the transmissibility increment of the variants. By analyzing the mutation-induced hydrophobic surface changes in the attraction and repulsion at the binding site of the complexes of the SARS-CoV-2 variants and antibodies, we found out that all the mutations of N501Y, E484K, K417N and L452R can selectively decrease or increase their binding affinity with some antibodies.

preprint2021arXiv

Nanoparticle Radiosensitization: from extended local effect modeling to a survival modificationframework of compound Poisson additive killing and its carbon dots validation

Objective: To construct an analytical model instead of local effect modeling for the prediction of the biological effectiveness of nanoparticle radiosensitization. Approach: An extended local effects model is first proposed with a more comprehensive description of the nanoparticles mediated local killing enhancements, but meanwhile puts forward challenging issues that remain difficult and need to be further studied. As a novel method instead of local effect modeling, a survival modification framework of compound Poisson additive killing is proposed, as the consequence of an independent additive killing by the assumed equivalent uniform doses of individual nanoparticles per cell under the LQ model. A compound Poisson killing (CPK)model based on the framework is thus derived, giving a general expression of nanoparticle mediated LQ parameter modification. For practical use, a simplified form of the model is also derived, as a concentration dependent correction only to the α parameter, with the relative correction (alpha&#34;/alpha)) dominated by the mean number, and affected by the agglomeration of nanoparticles per cell. For different agglomeration state, a monodispersion model of the dispersity factor η=1, and an agglomeration model of 2/3<η<1,are provided for practical prediction of (alpha&#34;/alpha) value respectively. Main results: Initial validation by the radiosensitization ofHepG2 cells by carbon dots showed a high accuracy of the CPK model. In a safe range of concentration (0.003-0.03 μg/μL)of the carbon dots, the prediction errors of the monodispersion and agglomeration models were both within 2%, relative to the clonogenic survival data of the sensitized HepG2 cells.

preprint2021arXiv

Selective Attention Based Graph Convolutional Networks for Aspect-Level Sentiment Classification

Aspect-level sentiment classification aims to identify the sentiment polarity towards a specific aspect term in a sentence. Most current approaches mainly consider the semantic information by utilizing attention mechanisms to capture the interactions between the context and the aspect term. In this paper, we propose to employ graph convolutional networks (GCNs) on the dependency tree to learn syntax-aware representations of aspect terms. GCNs often show the best performance with two layers, and deeper GCNs do not bring additional gain due to over-smoothing problem. However, in some cases, important context words cannot be reached within two hops on the dependency tree. Therefore we design a selective attention based GCN block (SA-GCN) to find the most important context words, and directly aggregate these information into the aspect-term representation. We conduct experiments on the SemEval 2014 Task 4 datasets. Our experimental results show that our model outperforms the current state-of-the-art.

preprint2020arXiv

A hydrophobic-interaction-based mechanism trigger docking between the SARS CoV 2 spike and angiotensin-converting enzyme 2

A recent experimental study found that the binding affinity between the cellular receptor human angiotensin converting enzyme 2 (ACE2) and receptor-binding domain (RBD) in spike (S) protein of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is more than 10-fold higher than that of the original severe acute respiratory syndrome coronavirus (SARS-CoV). However, main-chain structures of the SARS-CoV-2 RBD are almost the same with that of the SARS-CoV RBD. Understanding physical mechanism responsible for the outstanding affinity between the SARS-CoV-2 S and ACE2 is the &#34;urgent challenge&#34; for developing blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Considering the mechanisms of hydrophobic interaction, hydration shell, surface tension, and the shielding effect of water molecules, this study reveals a hydrophobic-interaction-based mechanism by means of which SARS-CoV-2 S and ACE2 bind together in an aqueous environment. The hydrophobic interaction between the SARS-CoV-2 S and ACE2 protein is found to be significantly greater than that between SARS-CoV S and ACE2. At the docking site, the hydrophobic portions of the hydrophilic side chains of SARS-CoV-2 S are found to be involved in the hydrophobic interaction between SARS-CoV-2 S and ACE2. We propose a method to design live attenuated viruses by mutating several key amino acid residues of the spike protein to decrease the hydrophobic surface areas at the docking site. Mutation of a small amount of residues can greatly reduce the hydrophobic binding of the coronavirus to the receptor, which may be significant reduce infectivity and transmissibility of the virus.

preprint2020arXiv

Graph Sequential Network for Reasoning over Sequences

Recently Graph Neural Network (GNN) has been applied successfully to various NLP tasks that require reasoning, such as multi-hop machine reading comprehension. In this paper, we consider a novel case where reasoning is needed over graphs built from sequences, i.e. graph nodes with sequence data. Existing GNN models fulfill this goal by first summarizing the node sequences into fixed-dimensional vectors, then applying GNN on these vectors. To avoid information loss inherent in the early summarization and make sequential labeling tasks on GNN output feasible, we propose a new type of GNN called Graph Sequential Network (GSN), which features a new message passing algorithm based on co-attention between a node and each of its neighbors. We validate the proposed GSN on two NLP tasks: interpretable multi-hop reading comprehension on HotpotQA and graph based fact verification on FEVER. Both tasks require reasoning over multiple documents or sentences. Our experimental results show that the proposed GSN attains better performance than the standard GNN based methods.

preprint2020arXiv

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial intelligence field involve multiple modalities. Therefore, it is of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, we provide a technical review of available models and learning methods for multimodal intelligence. The main focus of this review is the combination of vision and natural language modalities, which has become an important topic in both the computer vision and natural language processing research communities. This review provides a comprehensive analysis of recent works on multimodal deep learning from three perspectives: learning multimodal representations, fusing multimodal signals at various levels, and multimodal applications. Regarding multimodal representation learning, we review the key concepts of embedding, which unify multimodal signals into a single vector space and thereby enable cross-modality signal processing. We also review the properties of many types of embeddings that are constructed and learned for general downstream tasks. Regarding multimodal fusion, this review focuses on special architectures for the integration of representations of unimodal signals for a particular task. Regarding applications, selected areas of a broad interest in the current literature are covered, including image-to-text caption generation, text-to-image generation, and visual question answering. We believe that this review will facilitate future studies in the emerging field of multimodal intelligence for related communities.

preprint2020arXiv

Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product

Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we propose a multimodal method to jointly predict product attributes and extract values from textual product descriptions with the help of the product images. We argue that product attributes and values are highly correlated, e.g., it will be easier to extract the values on condition that the product attributes are given. Thus, we jointly model the attribute prediction and value extraction tasks from multiple aspects towards the interactions between attributes and values. Moreover, product images have distinct effects on our tasks for different product attributes and values. Thus, we selectively draw useful visual information from product images to enhance our model. We annotate a multimodal product attribute value dataset that contains 87,194 instances, and the experimental results on this dataset demonstrate that explicitly modeling the relationship between attributes and values facilitates our method to establish the correspondence between them, and selectively utilizing visual product information is necessary for the task. Our code and dataset will be released to the public.

preprint2020arXiv

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.

preprint2020arXiv

Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to solve the multi-document RC problem. Our system first filters out answer-unrelated documents and thus reduce the amount of distraction information. This is achieved by a document classifier trained with a novel pairwise learning-to-rank loss. The selected answer-related documents are then input to a model to jointly predict the answer and supporting sentences. The model is optimized with a multi-task learning objective on both token level for answer prediction and sentence level for supporting sentences prediction, together with an attention-based interaction between these two tasks. Evaluated on HotpotQA, a challenging multi-hop RC data set, the proposed SAE system achieves top competitive performance in distractor setting compared to other existing systems on the leaderboard.

preprint2020arXiv

Speaker Diarization with Lexical Information

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.

preprint2020arXiv

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamental research in dialogue task

preprint2020arXiv

The role of hydrophobic interactions in folding of $β$-sheets

Exploring the protein-folding problem has been a long-standing challenge in molecular biology. Protein folding is highly dependent on folding of secondary structures as the way to pave a native folding pathway. Here, we demonstrate that a feature of a large hydrophobic surface area covering most side-chains on one side or the other side of adjacent $β$-strands of a $β$-sheet is prevail in almost all experimentally determined $β$-sheets, indicating that folding of $β$-sheets is most likely triggered by multistage hydrophobic interactions among neighbored side-chains of unfolded polypeptides, enable $β$-sheets fold reproducibly following explicit physical folding codes in aqueous environments. $β$-turns often contain five types of residues characterized with relatively small exposed hydrophobic proportions of their side-chains, that is explained as these residues can block hydrophobic effect among neighbored side-chains in sequence. Temperature dependence of the folding of $β$-sheet is thus attributed to temperature dependence of the strength of the hydrophobicity. The hydrophobic-effect-based mechanism responsible for $β$-sheets folding is verified by bioinformatics analyses of thousands of results available from experiments. The folding codes in amino acid sequence that dictate formation of a $β$-hairpin can be deciphered through evaluating hydrophobic interaction among side-chains of an unfolded polypeptide from a $β$-strand-like thermodynamic metastable state.

preprint2019arXiv

Balanced Coherence Times of Mixed-Species Atomic Qubits in a Dual $3\times3$ Magic-Intensity Optical Dipole Trap Array

In this work, we construct a polarization-mediated magic-intensity (MI) optical dipole trap (ODT) array, in which the detrimental effects of light shifts on the mixed-species qubits are efficiently mitigated so that the coherence times of the mixed-species qubits are both substantially enhanced and balanced for the first time. This mixed-species magic trapping technique relies on the tunability of the coefficient of the third-order cross term and ground state hyperpolarizability, which are inherently dependent on the degree of circular polarization of the trap laser. Experimentally, polarization of the ODT array for $^{85}$Rb qubits is finely adjusted to a definite value so that its working magnetic field required for magic trapping amounts to the one required for magically trapping $^{87}$Rb qubits in another ODT array with fully circular polarization. Ultimately, in such a polarization-mediated MI-ODT array, the coherence times of $^{87}$Rb and $^{85}$Rb qubits are respectively enhanced up to 891$\pm$47 ms and 943$\pm$35 ms. Furthermore, a new source of dephasing effect is revealed, which arises from the noise of the elliptic polarization, and the reduction in corresponding dephasing effect on the $^{85}$Rb qubits is attainable by use of shallow magic intensity. It is anticipated that the novel mixed-species MI-ODT array is a versatile platform for building scalable quantum computers with neutral atoms.

preprint2019arXiv

Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist&#39;s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing

preprint2019arXiv

Preparation of a Heteronuclear Two-atom System in the 3D Motional Ground State in an Optical Tweezer

We report the realization of a heteronuclear two-atom of $^{87}$Rb-$^{85}$Rb in the ground state of an optical tweezer (OT). Starting by trapping two different isotopic single atoms, a $^{87}$Rb and a $^{85}$Rb in two strongly focused and linearly polarized OT with 4 $μ$m apart, we perform simultaneously three dimensional Raman sideband cooling for both atoms and the obtained 3D ground state probabilities of $^{87}$Rb and $^{85}$Rb are 0.91(5) and 0.91(10) respectively. There is no obvious crosstalk observed during the cooling process. We then merge them into one tweezer via a species-dependent transport, where the species-dependent potentials are made by changing the polarization of the OTs for each species from linear polarization to the desired circular polarization. The measurable increment of vibrational quantum due to merging is $0.013(1)$ for the axial dimension. This two-atom system can be used to investigate cold collisional physics, to form quantum logic gates, and to build a single heteronuclear molecule. It can also be scaled up to few-atom regime and extended to other atomic species and molecules, and thus to ultracold chemistry.