Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
44works
0followers
24topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

44 published item(s)

preprint2026arXiv

A radiation two-phase flow model for simulating plasma-liquid interactions

In laser-produced plasma (LPP) extreme ultraviolet (EUV) sources, deformation of a tin droplet into an optimal target shape is governed by its interaction with a pre-pulse laser-generated plasma. This interaction is mediated by a transient ablation pressure, whose complex spatio-temporal evolution remains experimentally inaccessible. Existing modeling approaches are limited: Empirical pressure-impulse models neglect dynamic plasma feedback, while advanced radiation-hydrodynamic codes often fail to resolve late-time droplet hydrodynamics. To bridge this gap, we propose a radiation two-phase flow model based on a diffuse interface methodology. The model integrates radiation hydrodynamics for the plasma with the Euler equations for a weakly compressible liquid, extending a five-equation diffuse interface formulation to incorporate radiation transport, thermal conduction, and ionization. This formulation enforces pressure and velocity equilibrium across the diffuse interface region, with closure models constructed to ensure correct jump conditions at interfaces and asymptotically recover the pure-phase equations in bulk regions. Then, we apply the model to simulate a benchmark pre-pulse scenario, where a 50 micron tin droplet is irradiated by a 10 ns laser pulse. The simulations capture the rapid plasma expansion and subsequent inertial flattening of the droplet into a thin, curved sheet over microsecond timescales. Notably, the model reproduces experimentally observed features (such as an axial jet) rarely replicated in prior simulations. Quantitative agreement with experimental data for sheet dimensions and velocity validates the approach. The proposed model self-consistently couples laser-plasma physics with compressible droplet dynamics, providing a powerful tool for fundamental studies of plasma-liquid interactions in LPP-EUV source optimization.

preprint2026arXiv

Belief Memory: Agent Memory Under Partial Observability

LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even though such observations are inherently partial and potentially ambiguous. By committing to one conclusion and discarding uncertainty, these methods introduce self-reinforcing error: the agent acts on the stored conclusion, never revisits alternatives, and reinforces the conclusion over time. To address this issue, we propose BeliefMem, which shifts the memory paradigm from committing to a single conclusion per observation to retaining multiple candidate conclusions with their probabilities. Concretely, BeliefMem stores the candidate conclusions as separate memory entries, each carrying a probability that is updated via Noisy-OR rules as new observations arrive. At retrieval, all candidates surface together with their probabilities, keeping alternatives visible to the agent. Since each conclusion in memory retains its probability, BeliefMem preserves the uncertainty that the deterministic paradigm discards, enabling the agent to act with high confidence on well-evidenced knowledge while retaining the capacity to update its confidence when new evidence arrives. Empirical evaluations on LoCoMo and ALFWorld benchmarks show that, even with limited data, BeliefMem achieves the best average performance, remarkably outperforming well-known baselines. More broadly, such probabilistic memory produces substantial gains and explores a new direction for agent memory in partially observable environments.

preprint2026arXiv

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars. Existing methods typically learn frame-level representations for each video by designing inter-frame temporal modeling strategies or inter-video interaction at the coarse video-level granularity. However, they treat each episode task in isolation and neglect fine-grained temporal relation modeling between videos, thus failing to capture shared fine-grained temporal patterns across videos and reuse temporal knowledge from historical tasks. In light of this, we propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR, which unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view. Going beyond conducting inter-frame temporal interactions, we further devise two components to respectively explore inter-video and inter-task relationships: i) Inter-video Semantic Correlation (ISC) performs cross-video frame-level interactions in a fine-grained manner, thereby capturing task-specific query features and enhancing both intra-class consistency and inter-class separability; ii) Inter-task Knowledge Transfer (IKT) retrieves and aggregates relevant temporal knowledge from the bank, which stores diverse temporal patterns from historical episode tasks. Extensive experiments on five benchmarks show that HR2G-shot outperforms current top-leading FSAR methods.

preprint2024arXiv

FABind: Fast and Accurate Protein-Ligand Binding

Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. $\mathbf{FABind}$ incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed $\mathbf{FABind}$ demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at https://github.com/QizhiPei/FABind

preprint2023arXiv

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

preprint2023arXiv

Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order

Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.

preprint2023arXiv

Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging

The collection and curation of large-scale medical datasets from multiple institutions is essential for training accurate deep learning models, but privacy concerns often hinder data sharing. Federated learning (FL) is a promising solution that enables privacy-preserving collaborative learning among different institutions, but it generally suffers from performance deterioration due to heterogeneous data distributions and a lack of quality labeled data. In this paper, we present a robust and label-efficient self-supervised FL framework for medical image analysis. Our method introduces a novel Transformer-based self-supervised pre-training paradigm that pre-trains models directly on decentralized target task datasets using masked image modeling, to facilitate more robust representation learning on heterogeneous data and effective knowledge transfer to downstream models. Extensive empirical results on simulated and real-world medical imaging non-IID federated datasets show that masked image modeling with Transformers significantly improves the robustness of models against various degrees of data heterogeneity. Notably, under severe data heterogeneity, our method, without relying on any additional pre-training data, achieves an improvement of 5.06%, 1.53% and 4.58% in test accuracy on retinal, dermatology and chest X-ray classification compared to the supervised baseline with ImageNet pre-training. In addition, we show that our federated self-supervised pre-training methods yield models that generalize better to out-of-distribution data and perform more effectively when fine-tuning with limited labeled data, compared to existing FL algorithms. The code is available at https://github.com/rui-yan/SSL-FL.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters

A high fidelity flow simulation for complex geometries for high Reynolds number ($Re$) flow is still very challenging, which requires more powerful computational capability of HPC system. However, the development of HPC with traditional CPU architecture suffers bottlenecks due to its high power consumption and technical difficulties. Heterogeneous architecture computation is raised to be a promising solution of difficulties of HPC development. GPU accelerating technology has been utilized in low order scheme CFD solvers on structured grid and high order scheme solvers on unstructured meshes. The high order finite difference methods on structured grid possess many advantages, e.g. high efficiency, robustness and low storage, however, the strong dependence among points for a high order finite difference scheme still limits its application on GPU platform. In present work, we propose a set of hardware-aware technology to optimize the efficiency of data transfer between CPU and GPU, and efficiency of communication between GPUs. An in-house multi-block structured CFD solver with high order finite difference methods on curvilinear coordinates is ported onto GPU platform, and obtain satisfying performance with speedup maximum around 2000x over a single CPU core. This work provides efficient solution to apply GPU computing in CFD simulation with certain high order finite difference methods on current GPU heterogeneous computers. The test shows that significant accelerating effects can been achieved for different GPUs.

preprint2022arXiv

All in One: Exploring Unified Video-Language Pre-training

Mainstream Video-Language Pre-training models \cite{actbert,clipbert,violet} consist of three parts, a video encoder, a text encoder, and a video-text fusion Transformer. They pursue better performance via utilizing heavier unimodal encoders or multimodal fusion Transformers, resulting in increased parameters with lower efficiency in downstream tasks. In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture. We argue that the unique temporal information of video data turns out to be a key barrier hindering the design of a modality-agnostic Transformer. To overcome the challenge, we introduce a novel and effective token rolling operation to encode temporal representations from video clips in a non-parametric manner. The careful design enables the representation learning of both video-text multimodal inputs and unimodal inputs using a unified backbone model. Our pre-trained all-in-one Transformer is transferred to various downstream video-text tasks after fine-tuning, including text-video retrieval, video-question answering, multiple choice and visual commonsense reasoning. State-of-the-art performances with the minimal model FLOPs on nine datasets demonstrate the superiority of our method compared to the competitive counterparts. The code and pretrained model have been released in https://github.com/showlab/all-in-one.

preprint2022arXiv

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of generated fake audio from other participants in Track 3.1, aiming to spoof our systems. Our systems use a standard 34-layer ResNet, with multi-head attention pooling \cite{india2019self} to learn the discriminative embedding for fake audio and spoof detection. We further utilize neural stitching to boost the model's generalization capability in order to perform equally well in different tasks, and more details will be explained in the following sessions. The experiments show that our proposed method outperforms all other systems with a 10.1% equal error rate(EER) in Track 3.2.

preprint2022arXiv

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition. Different group activities often show the diversified interactions between actors in the video. Hence, it is often difficult to model complex group activities from a single view of spatial-temporal actor evolution. To tackle this problem, we propose a distinct Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers in two complementary orders, enhancing actor relations by integrating merits from different spatiotemporal paths. Moreover, we introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI. Via self-supervised actor consistency in both frame and video levels, MAC-Loss can effectively distinguish individual actor representations to reduce action confusion among different actors. Consequently, our Dual-AI can boost group activity recognition by fusing such discriminative features of different actors. To evaluate the proposed approach, we conduct extensive experiments on the widely used benchmarks, including Volleyball, Collective Activity, and NBA datasets. The proposed Dual-AI achieves state-of-the-art performance on all these datasets. It is worth noting the proposed Dual-AI with 50% training data outperforms a number of recent approaches with 100% training data. This confirms the generalization power of Dual-AI for group activity recognition, even under the challenging scenarios of limited supervision.

preprint2022arXiv

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation or video-only representation to several video downstream tasks. Our Egocentric VLP achieves 10.46R@1&IoU @0.3 on NLQ, 10.33 mAP on MQ, 74% Acc on OSCC, 0.67 sec error on PNR. The code is available at https://github.com/showlab/EgoVLP.

preprint2022arXiv

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation to MIR benchmark. Furthermore, we devise an adaptive multi-instance max-margin loss to effectively fine-tune the model and equip the dual-softmax technique for reliable inference. Our best single model obtains strong performance on the challenge test set with 47.39% mAP and 61.44% nDCG. The code is available at https://github.com/showlab/EgoVLP.

preprint2022arXiv

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

Artificial intelligence and robotic competitions are accompanied by a class of game paradigms in which each player privately commits a strategy to a game system which simulates the game using the collected joint strategy and then returns payoffs to players. This paper considers the strategy commitment for two-player symmetric games in which the players' strategy spaces are identical and their payoffs are symmetric. First, we introduce two digraph-based metrics at a meta-level for strategy evaluation in two-agent reinforcement learning, grounded on sink equilibrium. The metrics rank the strategies of a single player and determine the set of strategies which are preferred for the private commitment. Then, in order to find the preferred strategies under the metrics, we propose two variants of the classical learning algorithm self-play, called strictly best-response and weakly better-response self-plays. By modeling learning processes as walks over joint-strategy response digraphs, we prove that the learnt strategies by two variants are preferred under two metrics, respectively. The preferred strategies under both two metrics are identified and adjacency matrices induced by one metric and one variant are connected. Finally, simulations are provided to illustrate the results.

preprint2022arXiv

Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition

This work focuses on the task of elderly activity recognition, which is a challenging task due to the existence of individual actions and human-object interactions in elderly activities. Thus, we attempt to effectively aggregate the discriminative information of actions and interactions from both RGB videos and skeleton sequences by attentively fusing multi-modal features. Recently, some nonlinear multi-modal fusion approaches are proposed by utilizing nonlinear attention mechanism that is extended from Squeeze-and-Excitation Networks (SENet). Inspired by this, we propose a novel Expansion-Squeeze-Excitation Fusion Network (ESE-FN) to effectively address the problem of elderly activity recognition, which learns modal and channel-wise Expansion-Squeeze-Excitation (ESE) attentions for attentively fusing the multi-modal features in the modal and channel-wise ways. Furthermore, we design a new Multi-modal Loss (ML) to keep the consistency between the single-modal features and the fused multi-modal features by adding the penalty of difference between the minimum prediction losses on single modalities and the prediction loss on the fused modality. Finally, we conduct experiments on a largest-scale elderly activity dataset, i.e., ETRI-Activity3D (including 110,000+ videos, and 50+ categories), to demonstrate that the proposed ESE-FN achieves the best accuracy compared with the state-of-the-art methods. In addition, more extensive experimental results show that the proposed ESE-FN is also comparable to the other methods in terms of normal action recognition task.

preprint2022arXiv

Finite-horizon Equilibria for Neuro-symbolic Concurrent Stochastic Games

We present novel techniques for neuro-symbolic concurrent stochastic games, a recently proposed modelling formalism to represent a set of probabilistic agents operating in a continuous-space environment using a combination of neural network based perception mechanisms and traditional symbolic methods. To date, only zero-sum variants of the model were studied, which is too restrictive when agents have distinct objectives. We formalise notions of equilibria for these models and present algorithms to synthesise them. Focusing on the finite-horizon setting, and (global) social welfare subgame-perfect optimality, we consider two distinct types: Nash equilibria and correlated equilibria. We first show that an exact solution based on backward induction may yield arbitrarily bad equilibria. We then propose an approximation algorithm called frozen subgame improvement, which proceeds through iterative solution of nonlinear programs. We develop a prototype implementation and demonstrate the benefits of our approach on two case studies: an automated car-parking system and an aircraft collision avoidance system.

preprint2022arXiv

Learning to Express in Knowledge-Grounded Conversation

Grounding dialogue generation by extra knowledge has shown great potentials towards building a system capable of replying with knowledgeable and engaging responses. Existing studies focus on how to synthesize a response with proper knowledge, yet neglect that the same knowledge could be expressed differently by speakers even under the same context. In this work, we mainly consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part. We therefore introduce two sequential latent variables to represent the structure and the content style respectively. We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response. Evaluation results on two benchmarks indicate that our model can learn the structure style defined by a few examples and generate responses in desired content style.

preprint2022arXiv

MISC: A MIxed Strategy-Aware Model Integrating COMET for Emotional Support Conversation

Applying existing methods to emotional support conversation -- which provides valuable assistance to people who are in need -- has two major limitations: (a) they generally employ a conversation-level emotion label, which is too coarse-grained to capture user's instant mental state; (b) most of them focus on expressing empathy in the response(s) rather than gradually reducing user's distress. To address the problems, we propose a novel model \textbf{MISC}, which firstly infers the user's fine-grained emotional status, and then responds skillfully using a mixture of strategy. Experimental results on the benchmark dataset demonstrate the effectiveness of our method and reveal the benefits of fine-grained emotion understanding as well as mixed-up strategy modeling. Our code and data could be found in \url{https://github.com/morecry/MISC}.

preprint2022arXiv

Object-aware Video-language Pre-training for Retrieval

Recently, by introducing large-scale dataset and strong transformer network, video-language pre-training has shown great success especially for retrieval. Yet, existing video-language transformer models do not explicitly fine-grained semantic align. In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. The key idea is to leverage the bounding boxes and object tags to guide the training process. We evaluate our model on three standard sub-tasks of video-text matching on four widely used benchmarks. We also provide deep analysis and detailed ablation about the proposed method. We show clear improvement in performance across all tasks and datasets considered, demonstrating the value of a model that incorporates object representations into a video-language architecture. The code will be released at \url{https://github.com/FingerRec/OA-Transformer}.

preprint2022arXiv

Optimizing doping parameters of target to enhance direct-drive implosion

Direct-drive is an important approach to achieving the ignition of inertial confinement fusion. To enhance implosion performance while keeping the risk of hydrodynamic instability at a low level, we have designed a procedure to optimize the parameters of the target doped with mid- or high-$Z$ atoms. In the procedure, a one-dimensional implosion can be automatically simulated, while its implosion performance and high-dimensional instability are integrally evaluated at the same time. To find the optimal doping parameters, the procedure is performed in the framework of global optimization algorithm, where we have used the particle swarm optimization in the current work. In the optimization, the opacity of mixture materials is quickly obtained by using an interpolation method, showing only a slight difference from the data of TOPS, which is an online doping program of Los Alamos National Laboratory. To test the procedure, optimization has been carried out for the CH ablator in the double cone ignition scheme [Phil. Trans. R. Soc. A. 378.2184 (2020)] by doping with Si and Cl. Both one- and two-dimensional simulations show that doping with either Si or Cl can efficiently mitigate the instability during the acceleration phase and does not result in significant degradation of the peak areal density. The results from one- and two-dimensional simulations qualitatively match with each other, demonstrating the validity of our optimization procedure.

preprint2022arXiv

OTExtSum: Extractive Text Summarisation with Optimal Transport

Extractive text summarisation aims to select salient sentences from a document to form a short yet informative summary. While learning-based methods have achieved promising results, they have several limitations, such as dependence on expensive training and lack of interpretability. Therefore, in this paper, we propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem, namely Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction is conceptualised as obtaining an optimal summary that minimises the transportation cost to a given document regarding their semantic distributions. Such a cost is defined by the Wasserstein distance and used to measure the summary's semantic coverage of the original document. Comprehensive experiments on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and CNN/DM demonstrate that our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.

preprint2022arXiv

Probabilistic Model Checking for Strategic Equilibria-based Decision Making: Advances and Challenges

Game-theoretic concepts have been extensively studied in economics to provide insight into competitive behaviour and strategic decision making. As computing systems increasingly involve concurrently acting autonomous agents, game-theoretic approaches are becoming widespread in computer science as a faithful modelling abstraction. These techniques can be used to reason about the competitive or collaborative behaviour of multiple rational agents with distinct goals or objectives. This paper provides an overview of recent advances in developing a modelling, verification and strategy synthesis framework for concurrent stochastic games implemented in the probabilistic model checker PRISM-games. This is based on a temporal logic that supports finite- and infinite-horizon temporal properties in both a zero-sum and nonzero-sum setting, the latter using Nash and correlated equilibria with respect to two optimality criteria, social welfare and social fairness. We summarise the key concepts, logics and algorithms and the currently available tool support. Future challenges and recent progress in adapting the framework and algorithmic solutions to continuous environments and neural networks are also outlined.

preprint2022arXiv

RetroGraph: Retrosynthetic Planning with Graph Search

Retrosynthetic planning, which aims to find a reaction pathway to synthesize a target molecule, plays an important role in chemistry and drug discovery. This task is usually modeled as a search problem. Recently, data-driven methods have attracted many research interests and shown promising results for retrosynthetic planning. We observe that the same intermediate molecules are visited many times in the searching process, and they are usually independently treated in previous tree-based methods (e.g., AND-OR tree search, Monte Carlo tree search). Such redundancies make the search process inefficient. We propose a graph-based search policy that eliminates the redundant explorations of any intermediate molecules. As searching over a graph is more complicated than over a tree, we further adopt a graph neural network to guide the search over graphs. Meanwhile, our method can search a batch of targets together in the graph and remove the inter-target duplication in the tree-based search methods. Experimental results on two datasets demonstrate the effectiveness of our method. Especially on the widely used USPTO benchmark, we improve the search success rate to 99.47%, advancing previous state-of-the-art performance for 2.6 points.

preprint2022arXiv

Strolling in Room-Scale VR: Hex-Core-MK1 Omnidirectional Treadmill

The natural locomotion interface is critical to the development of many VR applications. For household VR applications, there are two basic requirements: natural immersive experience and minimized space occupation. The existing locomotion strategies generally do not simultaneously satisfy these two requirements well. This paper presents a novel omnidirectional treadmill (ODT) system, named Hex-Core-MK1 (HCMK1). By implementing two kinds of mirror symmetrical spiral rollers to generate the omnidirectional velocity field, this proposed system is capable of providing real walking experiences with a full-degree of freedom in an area as small as 1.76 m^2, while delivering great advantages over several existing ODT systems in terms of weight, volume, latency and dynamic performance. Compared with the sizes of Infinadeck and HCP, the two best motor-driven ODTs so far, the 8 cm height of HCMK1 is only 20% of Infinadeck and 50% of HCP. In addition, HCMK1 is a lightweight device weighing only 110 kg, which provides possibilities of further expanding VR scenarios, such as terrain simulation. The latency of HCMK1 is only 23ms. The experiments show that HCMK1 can deliver on a starting acceleration of 16.00 m/s^2 and a braking acceleration of 30.00 m/s^2.

preprint2022arXiv

Target-aware Abstractive Related Work Generation with Contrastive Learning

The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers. Authors can save their time and effort by using the automatically generated related work section as a draft to complete the final related work. Most of the existing related work section generation methods rely on extracting off-the-shelf sentences to make a comparative discussion about the target work and the reference papers. However, such sentences need to be written in advance and are hard to obtain in practice. Hence, in this paper, we propose an abstractive target-aware related work generator (TAG), which can generate related work sections consisting of new sentences. Concretely, we first propose a target-aware graph encoder, which models the relationships between reference papers and the target paper with target-centered attention mechanisms. In the decoding process, we propose a hierarchical decoder that attends to the nodes of different levels in the graph with keyphrases as semantic indicators. Finally, to generate a more informative related work, we propose multi-level contrastive optimization objectives, which aim to maximize the mutual information between the generated related work with the references and minimize that with non-references. Extensive experiments on two public scholar datasets show that the proposed model brings substantial improvements over several strong baselines in terms of automatic and tailored human evaluations.

preprint2022arXiv

There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory

Knowledge-grounded conversation (KGC) shows great potential in building an engaging and knowledgeable chatbot, and knowledge selection is a key ingredient in it. However, previous methods for knowledge selection only concentrate on the relevance between knowledge and dialogue context, ignoring the fact that age, hobby, education and life experience of an interlocutor have a major effect on his or her personal preference over external knowledge. Without taking the personalization issue into account, it is difficult to select the proper knowledge and generate persona-consistent responses. In this work, we introduce personal memory into knowledge selection in KGC to address the personalization issue. We propose a variational method to model the underlying relationship between one's personal memory and his or her selection of knowledge, and devise a learning scheme in which the forward mapping from personal memory to knowledge and its inverse mapping is included in a closed loop so that they could teach each other. Experiment results show that our method outperforms existing KGC methods significantly on both automatic evaluation and human evaluation.

preprint2022arXiv

Time Domain Adversarial Voice Conversion for ADD 2022

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech. Then the converted speech generated from VC is post-processed in the time domain to improve the deception ability. The experimental results show that our system has adversarial ability against anti-spoofing detectors with a little compromise in audio quality and speaker similarity. This system ranks top in Track 3.1 in the ADD 2022, showing that our method could also gain good generalization ability against different detectors.

preprint2021arXiv

Correlative image learning of chemo-mechanics in phase-transforming solids

Constitutive laws underlie most physical processes in nature. However, learning such equations in heterogeneous solids (e.g., due to phase separation) is challenging. One such relationship is between composition and eigenstrain, which governs the chemo-mechanical expansion in solids. In this work, we developed a generalizable, physically-constrained image-learning framework to algorithmically learn the chemo-mechanical constitutive law at the nanoscale from correlative four-dimensional scanning transmission electron microscopy and X-ray spectro-ptychography images. We demonstrated this approach on Li$_X$FePO$_4$, a technologically-relevant battery positive electrode material. We uncovered the functional form of composition-eigenstrain relation in this two-phase binary solid across the entire composition range (0 $\leq$ X $\leq$ 1), including inside the thermodynamically-unstable miscibility gap. The learned relation directly validates Vegard's law of linear response at the nanoscale. Our physics-constrained data-driven approach directly visualizes the residual strain field (by removing the compositional and coherency strain), which is otherwise impossible to quantify. Heterogeneities in the residual strain arise from misfit dislocations and were independently verified by X-ray diffraction line profile analysis. Our work provides the means to simultaneously quantify chemical expansion, coherency strain and dislocations in battery electrodes, which has implications on rate capabilities and lifetime. Broadly, this work also highlights the potential of integrating correlative microscopy and image learning for extracting material properties and physics.

preprint2021arXiv

Experimental Analysis of PandaX-4T Cryogenic Distillation System for Removing Krypton from Xenon

An efficient cryogenic distillation system was designed and constructed for PandaX-4T dark matter detector based on the McCabe-Thiele (M-T) method and the conservation of mass and energy. This distillation system is designed to reduce the concentration of krypton in commercial xenon from 5X$10^{-7}$ mol/mol to $10^{-14}$ mol/mol with 99% xenon collection efficiency at a maximum flow rate of 10 kg/h. The offline distillation operation has been completed and 5.75 tons of ultra-high purity xenon was produced, which is used as the detection medium in PandaX-4T detector. The krypton concentration of the product xenon is measured with an upper limit of 8.0 ppt. The stability and purification performance of the cryogenic distillation system are studied by analyzing the experimental data, which is important for theoretical research and distillation operation optimization.

preprint2021arXiv

Neutronics Analysis for MSR Cell with Different Fuel Salt Channel Geometry

The neutronic properties of Molten Salt Reactor are different from that of traditional solid-fuel reactors due to its nuclear fuel particularity. Based upon MCNP code, the influence of the size and shape of fuel salt channel on neutron physics of MSR cell was studied systematically in this work. The results show that the infinite multiplication factors increases first and then decreases with the change of graphite cell size under the condition of given fuel volume fraction. In the case of the same FVF and average chord length, when the average chord length is relatively small, the k values with different fuel salt channel shapes are in good agreement; when the average chord length is relatively large, the k values with different fuel salt channel shapes are greatly different. In addition, some examples of practical application of this work are illustrated in the end, including cell selection for the core and thermal expansion displacement analysis of the cell.

preprint2021arXiv

Research on Brick Schema Representation for Building Operation with Variable Refrigerant Flow Systems

Building metadata is regarded as the signpost in organizing massive building data. The application of building metadata simplifies the creation of digital representations and provides portable data analytics. Typical metadata standards such as Brick and Haystack are used to describe the data of the building system. Brick uses standard ontologies to create building metadata. However, neither Haystack nor Brick has provided definitions about the Variable Refrigerant Flow (VRF) system so far. For years, both Brick and Haystack working groups have been discussing how to describe VRF in their schema, mainly about the classification of VRF and the definitions of VRF units. There were no settled solutions for these problems. Meanwhile, the global VRF market is growing increasingly fast because of the energy efficiency and installation simplicity of the VRF system. It is needed to have the metadata to describe VRF units in buildings for data analysis and management. Addressing this challenge, this paper extended Brick Schema with the VRF module and verified the Brick VRF module. Then, the model and the service framework were developed and applied for a building in China. The framework can serve portable energy analysis for different areas. The VRF module of this paper provides a possible solution for the expression of the VRF system in the building semantic web. The works in this paper will support semantic web in automation strategies for building management and scalable building operation.

preprint2020arXiv

Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce

With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

preprint2020arXiv

EnsembleGAN: Adversarial Learning for Retrieval-Generation Ensemble Model on Short-Text Conversation

Generating qualitative responses has always been a challenge for human-computer dialogue systems. Existing dialogue systems generally derive from either retrieval-based or generative-based approaches, both of which have their own pros and cons. Despite the natural idea of an ensemble model of the two, existing ensemble methods only focused on leveraging one approach to enhance another, we argue however that they can be further mutually enhanced with a proper training strategy. In this paper, we propose ensembleGAN, an adversarial learning framework for enhancing a retrieval-generation ensemble model in open-domain conversation scenario. It consists of a language-model-like generator, a ranker generator, and one ranker discriminator. Aiming at generating responses that approximate the ground-truth and receive high ranking scores from the discriminator, the two generators learn to generate improved highly relevant responses and competitive unobserved candidates respectively, while the discriminative ranker is trained to identify true responses from adversarial ones, thus featuring the merits of both generator counterparts. The experimental results on a large short-text conversation data demonstrate the effectiveness of the ensembleGAN by the amelioration on both human and automatic evaluation metrics.

preprint2020arXiv

Epipolar Transformers

A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps: (1) apply a 2D detector separately on each view to localize joints in 2D, and (2) perform robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information. Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'. Experiments on InterHand and Human3.6M show that our approach has consistent improvements over the baselines. Specifically, in the condition where no external data is used, our Human3.6M model trained with ResNet-50 backbone and image size 256 x 256 outperforms state-of-the-art by 4.23 mm and achieves MPJPE 26.9 mm.

preprint2020arXiv

From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information

Text summarization is the research area aiming at creating a short and condensed version of the original document, which conveys the main idea of the document in a few words. This research topic has started to attract the attention of a large community of researchers, and it is nowadays counted as one of the most promising research areas. In general, text summarization algorithms aim at using a plain text document as input and then output a summary. However, in real-world applications, most of the data is not in a plain text format. Instead, there is much manifold information to be summarized, such as the summary for a web page based on a query in the search engine, extreme long document (e.g., academic paper), dialog history and so on. In this paper, we focus on the survey of these new summarization tasks and approaches in the real-world application.

preprint2020arXiv

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

Building an intelligent dialogue system with the ability to select a proper response according to a multi-turn context is a great challenging task. Existing studies focus on building a context-response matching model with various neural architectures or PLMs and typically learning with a single response prediction task. These approaches overlook many potential training signals contained in dialogue data, which might be beneficial for context understanding and produce better features for response prediction. Besides, the response retrieved from existing dialogue systems supervised by the conventional way still faces some critical challenges, including incoherence and inconsistency. To address these issues, in this paper, we propose learning a context-response matching model with auxiliary self-supervised tasks designed for the dialogue data based on pre-trained language models. Specifically, we introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination, and jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner. By this means, the auxiliary tasks can guide the learning of the matching model to achieve a better local optimum and select a more proper response. Experiment results on two benchmarks indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection in retrieval-based dialogues, and our model achieves new state-of-the-art results on both datasets.

preprint2020arXiv

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Training the generative models with minimal corpus is one of the critical challenges for building open-domain dialogue systems. Existing methods tend to use the meta-learning framework which pre-trains the parameters on all non-target tasks then fine-tunes on the target task. However, fine-tuning distinguishes tasks from the parameter perspective but ignores the model-structure perspective, resulting in similar dialogue models for different tasks. In this paper, we propose an algorithm that can customize a unique dialogue model for each task in the few-shot setting. In our approach, each dialogue model consists of a shared module, a gating module, and a private module. The first two modules are shared among all the tasks, while the third one will differentiate into different network structures to better capture the characteristics of the corresponding task. The extensive experiments on two datasets show that our method outperforms all the baselines in terms of task consistency, response quality, and diversity.

preprint2020arXiv

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.

preprint2020arXiv

Low-Resource Knowledge-Grounded Dialogue Generation

Responding with knowledge has been recognized as an important capability for an intelligent conversational agent. Yet knowledge-grounded dialogues, as training data for learning such a response generation model, are difficult to obtain. Motivated by the challenge in practice, we consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of ungrounded dialogues and unstructured documents, while the remaining small parameters can be well fitted using the limited training examples. Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge.

preprint2020arXiv

Matching-Based Capture Strategies for 3D Heterogeneous Multiplayer Reach-Avoid Differential Games

This paper studies a 3D multiplayer reach-avoid differential game with a goal region and a play region. Multiple pursuers defend the goal region by consecutively capturing multiple evaders in the play region. The players have heterogeneous moving speeds and the pursuers have heterogeneous capture radii. Since this game is hard to analyze directly, we decompose the whole game as many subgames which involve multiple pursuers and only one evader. Then, these subgames are used as a building block for the pursuer-evader matching. First, for multiple pursuers and one evader, we introduce an evasion space (ES) method characterized by a potential function to construct a guaranteed pursuer winning strategy. Then, based on this strategy, we develop conditions to determine whether a pursuit team can guard the goal region against one evader. It is shown that in 3D, if a pursuit team is able to defend the goal region against an evader, then at most three pursuers in the team are necessarily needed. We also compute the value function of the Hamilton-Jacobi-Isaacs (HJI) equation for a special subgame of degree. To capture the maximum number of evaders in the open-loop sense, we formulate a maximum bipartite matching problem with conflict graph (MBMC). We show that the MBMC is NP-hard and design a polynomial-time constant-factor approximation algorithm to solve it. Finally, we propose a receding horizon strategy for the pursuit team where in each horizon an MBMC is solved and the strategies of the pursuers are given. We also extend our results to the case of a bounded convex play region where the evaders escape through an exit. Two numerical examples are provided to demonstrate the obtained results.

preprint2020arXiv

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.

preprint2020arXiv

RPM-Oriented Query Rewriting Framework for E-commerce Keyword-Based Sponsored Search

Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM). Existing sponsored search models are all based on traditional statistical models, which have poor RPM performance when queries follow a heavy-tailed distribution. Here, we propose an RPM-oriented Query Rewriting Framework (RQRF) which outputs related bid keywords that can yield high RPM. RQRF embeds both queries and bid keywords to vectors in the same implicit space, converting the rewriting probability between each query and keyword to the distance between the two vectors. For label construction, we propose an RPM-oriented sample construction method, labeling keywords based on whether or not they can lead to high RPM. Extensive experiments are conducted to evaluate performance of RQRF. In a one month large-scale real-world traffic of e-commerce sponsored search system, the proposed model significantly outperforms traditional baseline.

preprint2020arXiv

Social Adaptive Module for Weakly-supervised Group Activity Recognition

This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data. This eases us to collect and annotate a large-scale NBA dataset and thus raise new challenges to GAR. To mine useful information from weak supervision, we present a key insight that key instances are likely to be related to each other, and thus design a social adaptive module (SAM) to reason about key persons and frames from noisy data. Experiments show significant improvement on the NBA dataset as well as the popular volleyball dataset. In particular, our model trained on video-level annotation achieves comparable accuracy to prior algorithms which required strong labels.