Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
37works
0followers
32topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

37 published item(s)

preprint2026arXiv

Evolutionary Ensemble of Agents

We introduce Evolutionary Ensemble (EvE), a decentralized framework that organizes existing, highly capable coding agents into a live, co-evolving system for algorithmic discovery. Rather than reinventing the wheel within the "LLMs as optimizers" paradigm, EvE fixes the base agent substrate and focuses entirely on evolving the cumulative guidance and skills that dictate agent behaviors. By maintaining two co-evolving populations, namely functional code solvers and agent guidance states, the system evaluates agents through a synchronous race, updating their empirical Elo ratings based on the marginal gains they contribute to the current solver state. When applied to a research bottleneck in In-Context Operator Networks (ICON), EvE autonomously discovered a robust rescale-then-interpolate mechanism that enables reliable example-count generalization. Crucially, controlled ablations reveal the absolute necessity of stage-dependent agent adaptation to navigate the shifting search landscapes of complex codebases. Compared to variants driven by a fixed initial agent or even a frozen "best-evolved" agent, EvE uniquely avoids phase mismatch, demonstrating that organizing agents into a self-revising ensemble is the fundamental driver for breaking through static performance ceilings.

preprint2026arXiv

One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning

In Large Language Model (LLM) fine-tuning, parameter and data selection are common strategies for reducing fine-tuning cost, yet they are typically driven by separate scoring mechanisms. When a parameter mask and data subset jointly determine restricted fine-tuning, this separation incurs redundant overhead and makes coordinated selection difficult. We cast parameter and data selection as two bilevel selection problems under a common validation objective and derive a shared local response-surrogate scoring rule. Under first- and second-order validation-improvement approximations, parameter importance and data utility emerge as column-wise and row-wise aggregations of a single gradient interaction matrix, yielding a closed-form row-column correspondence for co-extracting both signals. Building on this structure, we propose DualSFT (Dual-Selection Fine-Tuning), a one-shot dual-scoring algorithm that produces a parameter mask and data subset from shared gradient statistics. On 3B-9B LLMs, single-axis DualSFT variants strengthen target-task performance and stability-plasticity trade-offs within their comparison groups, while full DualSFT yields a more favorable joint-constrained trade-off than sequential hybrid baselines under matched budgets.

preprint2026arXiv

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

preprint2025arXiv

A Tale of Two Shocks

Energetic particles in interplanetary space are normally measured at time scales that are long compared to the ion gyroperiod. Such observations by necessity average out the microphysics associated with the acceleration and transport of 10s - 100s keV particles. We investigate previously unseen non-equilibrium features that only become observable at very high time resolution, and discuss possible explanations of these features. We use unprecedentedly high-time-resolution data that were acquired by the in situ instruments on Solar Orbiter in the vicinity of two interplanetary shocks observed on 2023-11-29 07:51:17 UTC and 2023-11-30 10:47:26 UTC at $\sim 0.83$ astronomical units from the Sun. The solar-wind proton beam population follows the magnetic field instantaneously, on time scales which are significantly shorter than a gyro-period. Energetic particles, despite sampling large volumes of space, vary on remarkably short time scales, typically on the order of the convection time of their gyro-radius. Non-equilibrium features such as bump-on-tail distributions of energetic particles are formed by small-scale magnetic structures in the IMF. High-time-resolution observations show previously unobserved microphysics in the vicinity of two traveling interplanetary shocks, including ion reflection at a current sheet, which may explain where ions are reflected in shock acceleration.

preprint2025arXiv

Interface-Controlled Antiferromagnetic Tunnel Junctions based on a metallic van der Waals A-type Antiferromagnet

Magnetic tunnel junctions (MTJs) are crucial components in high-performance spintronic devices. Traditional MTJs rely on ferromagnetic (FM) materials but significant improvements in speed and packing density could be enabled by exploiting antiferromagnetic (AFM) compounds instead. Here, we report all-collinear AFM tunnel junctions (AFMTJs) fabricated with van der Waals A-type AFM metal (Fe0.6Co0.4)5GeTe2 (FCGT) electrodes and nonmagnetic semiconducting WSe2 tunnel barriers. The AFMTJ heterostructure device achieves a tunneling magnetoresistance (TMR) ratio of up to 75% in response to magnetic field switching. Our results demonstrate that the TMR exclusively emerges in the AFM state of FCGT, rather than during the AFM-to-FM transition. By engineering FCGT electrodes with either even- or odd-layer configurations, volatile or non-volatile TMR could be selected, consistent with an entirely interfacial effect. TMR in the even-layer devices arose by Néel vector switching. In the odd-layer devices, TMR stemmed from interfacial spin-flipping. Experimental and theoretical analyses reveal a new TMR mechanism associated with interface-driven spin-polarized transport, despite the spin-independent nature of bulk FCGT. Our work demonstrates that collinear AFMTJs can provide comparable performance to conventional MTJs and introduces a new paradigm for AFM spintronics, in which the spin-dependent properties of AFM interfaces are harnessed.

preprint2024arXiv

Nonvolatile optical control of interlayer stacking order in 1T-TaS2

Nonvolatile optical manipulation of material properties on demand is a highly sought-after feature in the advancement of future optoelectronic applications. While the discovery of such metastable transition in various materials holds good promise for achieving this goal, their practical implementation is still in the nascent stage. Here, we unravel the nature of the ultrafast laser-induced hidden state in 1T-TaS2 by systematically characterizing the electronic structure evolution throughout the reversible transition cycle. We identify it as a mixed-stacking state involving two similarly low-energy interlayer orders, which is manifested as the charge density wave phase disruption. Furthermore, our comparative experiments utilizing the single-pulse writing, pulse-train erasing and pulse-pair control explicitly reveal the distinct mechanism of the bidirectional transformations -- the ultrafast formation of the hidden state is initiated by a coherent phonon which triggers a competition of interlayer stacking orders, while its recovery to the initial state is governed by the progressive domain coarsening. Our work highlights the deterministic role of the competing interlayer orders in the nonvolatile phase transition in the layered material 1T-TaS2, and promises the coherent control of the phase transition and switching speed. More importantly, these results establish all-optical engineering of stacking orders in low-dimensional materials as a viable strategy for achieving desirable nonvolatile electronic devices.

preprint2022arXiv

A Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Industrial Wireless Sensor Networks

Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And then a density based outlier detection mechanism is introduced to acquire an adaptive trust threshold used to isolate the malicious nodes from being cluster heads. Finally, a fuzzy based cluster heads election method is proposed to achieve a balance between energy saving and security assurance, so that a normal sensor node with more residual energy or less confidence on other nodes has higher probability to be the cluster head. Extensive experiments verify that our secure clustering protocol can effectively defend the network against attacks from internal malicious or compromised nodes.

preprint2022arXiv

An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluation method is presented to transform the transmission evidences into trust values while effectively alleviating the trust uncertainty. And then, a K-Means based outlier detection scheme is proposed to further analyze plenty of trust values obtained via fuzzy trust evaluation or trust recommendation. It can discover the commonalities and differences among sensor nodes while improving the accuracy of outlier detection. Finally, we present an evolutionary game based secure clustering protocol to achieve a trade-off between security assurance and energy saving for sensor nodes when electing for the cluster heads. A sensor node which failed to be the cluster head can securely choose its own head by isolating the suspicious nodes. Simulation results verify that our secure clustering protocol can effectively defend the network against the attacks from internal selfish or compromised nodes. Correspondingly, the timely data transfer rate can be improved significantly.

preprint2022arXiv

An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things

Edge enabled Industrial Internet of Things (IIoT) platform is of great significance to accelerate the development of smart industry. However, with the dramatic increase in real-time IIoT applications, it is a great challenge to support fast response time, low latency, and efficient bandwidth utilization. To address this issue, Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. To the best of our knowledge, the combinability of multiple flows, which can significantly affect the scheduling performance, has never been systematically analyzed before. In this article, we first analyze the combinability problem. Then a non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Moreover, to improve bandwidth utilization, a dynamic queue scheduling (DQS) method is presented for the best-effort flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.

preprint2022arXiv

An Intelligent Trust Cloud Management Method for Secure Clustering in 5G enabled Internet of Medical Things

5G edge computing enabled Internet of Medical Things (IoMT) is an efficient technology to provide decentralized medical services while Device-to-device (D2D) communication is a promising paradigm for future 5G networks. To assure secure and reliable communication in 5G edge computing and D2D enabled IoMT systems, this paper presents an intelligent trust cloud management method. Firstly, an active training mechanism is proposed to construct the standard trust clouds. Secondly, individual trust clouds of the IoMT devices can be established through fuzzy trust inferring and recommending. Thirdly, a trust classification scheme is proposed to determine whether an IoMT device is malicious. Finally, a trust cloud update mechanism is presented to make the proposed trust management method adaptive and intelligent under an open wireless medium. Simulation results demonstrate that the proposed method can effectively address the trust uncertainty issue and improve the detection accuracy of malicious devices.

preprint2022arXiv

Downwash-aware Control Allocation for Over-actuated UAV Platforms

Tracking position and orientation independently affords more agile maneuver for over-actuated multirotor Unmanned Aerial Vehicles (UAVs) while introducing undesired downwash effects; downwash flows generated by thrust generators may counteract others due to close proximity, which significantly threatens the stability of the platform. The complexity of modeling aerodynamic airflow challenges control algorithms from properly compensating for such a side effect. Leveraging the input redundancies in over-actuated UAVs, we tackle this issue with a novel control allocation framework that considers downwash effects and explores the entire allocation space for an optimal solution. This optimal solution avoids downwash effects while providing high thrust efficiency within the hardware constraints. To the best of our knowledge, ours is the first formal derivation to investigate the downwash effects on over-actuated UAVs. We verify our framework on different hardware configurations in both simulation and experiment.

preprint2022arXiv

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks caused by either static topology heterogeneity or dynamic bandwidth contentions. Existing solutions, either system-level optimizations strengthening BSP (e.g., Ring or Hierarchical All-reduce) or algorithmic optimizations replacing BSP (e.g., ASP or SSP, which relax the global barriers), do not completely solve the problem, as they may still suffer from communication inefficiency or risk convergence inaccuracy. In this paper, we present a novel divide-and-shuffle synchronization (DS-Sync) to realize communication efficiency without sacrificing convergence accuracy for distributed DNN training. At its heart, by taking into account the network bottlenecks, DS-Sync improves communication efficiency by dividing workers into non-overlap groups to synchronize independently in a bottleneck-free manner. Meanwhile, it maintains convergence accuracy by iteratively shuffling workers among different groups to ensure a global consensus. We theoretically prove that DS-Sync converges properly in non-convex and smooth conditions like DNN. We further implement DS-Sync and integrate it with PyTorch, and our testbed experiments show that DS-Sync can achieve up to $94\%$ improvements on the end-to-end training time with existing solutions while maintaining the same accuracy.

preprint2022arXiv

Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks

Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and reliable services. In this article, a generative adversarial learning-enabled trust management method is presented for 6G wireless networks. Some typical AI-based trust management schemes are first reviewed, and then a potential heterogeneous and intelligent 6G architecture is introduced. Next, the integration of AI and trust management is developed to optimize the intelligence and security. Finally, the presented AI-based trust management method is applied to secure clustering to achieve reliable and real-time communications. Simulation results have demonstrated its excellent performance in guaranteeing network security and service quality.

preprint2022arXiv

High Spatial and Temporal Resolution NIR-IIb Gastrointestinal Imaging in Mice

Conventional biomedical imaging modalities, including endoscopy, X-rays, and magnetic resonance, are invasive and cannot provide sufficient spatial and temporal resolutions for regular imaging of gastrointestinal (GI) tract to guide prognosis and therapy of GI diseases. Here we report a non-invasive method for optical imaging of GI tract. It is based on a new type of lanthanide-doped nanocrystal with near-infrared (NIR) excitation at 980 nm and second NIR window (NIR-IIb) (1500~1700 nm) fluorescence emission at around 1530 nm. The rational design and controlled synthesis of nanocrystals with high brightness have led to an absolute quantum yield (QY) up to 48.6%. Further benefitting from the minimized scattering through the NIR-IIb window, we enhanced the spatial resolution by 3 times compared with the other NIR-IIa (1000~1500 nm) contract agents for GI tract imaging. The approach also led to a high temporal resolution of 8 frames per second, so that the moment of mice intestinal peristalsis happened in one minute can be captured. Furthermore, with a light-sheet imaging system, we demonstrated a three-dimensional (3D) imaging of the stereoscopic structure of the GI tract. Moreover, we successfully translate these advances to diagnose inflammatory bowel disease (IBD) in a pre-clinical model of mice colitis.

preprint2022arXiv

Inflation-induced aneurysm formation and evolution in graded cylindrical tubes of arbitrary thickness

We study the initiation and evolution of aneurysmal morphology in a pressurized soft tube where the elastic modulus is non-uniform in the radial direction. The primary deformation prior to instability is characterized within the framework of nonlinear elasticity for a general material constitution and a generic modulus gradient. To unravel the influence of modulus gradient on aneurysm formation, we employ the incompressible Gent model and select three representative modulus gradients, including a linear, an exponential, and a sinusoidal function. In particular, the sinusoidal distribution can be used to model actual artery structure. In addition, two prototypical loading conditions are considered, namely, either the resultant axial force or the axial length can be fixed. Based on an explicit bifurcation condition in terms of the internal pressure and the resultant axial force for aneurysm formation or localized bulging, an exhaustive theoretical analysis on bulge initiation is carried out and the effect of geometric and material parameters and modulus gradient on the critical stretch generating localized bulging is revealed. It turns out that the modulus mismatch, as well as the position of maximum modulus, can dramatically affect the onset of localized bulging. Then we analytically elucidate the influence of modulus gradient on bulge propagation and conduct a finite element analysis of bulge evolution based on a robust finite element model established in Abaqus by UHYPER subroutine coding. Interestingly, it is found that a sinusoidally distributed modulus has negligible influence on the critical stretch of bulge initiation, the deformation process of bugle growth, and the maximum size of a bulge. The current analysis can provide useful insight into the biological evolution of human artery and into localized instabilities in graded structures.

preprint2022arXiv

Phonon-mediated superconductivity in two-dimensional hydrogenated phosphorus carbide: HPC$_{3}$

In the recent years, three-dimensional (3D) high-temperature superconductors at ultrahigh pressure have been reported, typical examples are the polyhydrides H$_{3}$S, LaH$_{10}$, and YH$_{9}$, etc. To find high-temperature superconductors in two-dimensional (2D) at atmosphere pressure is another research hotspot. Here, we investigated the possible superconductivity in a hydrogenated monolayer phosphorus carbide based on first-principles calculations. The results reveal that monolayer PC$_{3}$ transforms from a semiconductor to a metal after hydrogenation. Interestingly, the C-$π$-bonding band contributes most to the states at the Fermi level. Based on the electron-phonon coupling mechanism, it is found that the electron-phonon coupling constant of HPC$_{3}$ is 0.95, which mainly origins from the coupling of C-$π$ electrons with the in-plane vibration modes of C and H. The calculated critical temperature $T_{c}$ is 31.0 K, which is higher than most of the 2D superconductors. By further applying biaxial tensile strain of 3$\%$, the $T_{c}$ can be boosted to 57.3 K, exceeding the McMillan limit. Thus, hydrogenation and strain are effective ways for increasing the superconducting $T_{c}$ of 2D materials.

preprint2022arXiv

Practical and Secure Federated Recommendation with Personalized Masks

Federated recommendation addresses the data silo and privacy problems altogether for recommender systems. Current federated recommender systems mainly utilize cryptographic or obfuscation methods to protect the original ratings from leakage. However, the former comes with extra communication and computation costs, and the latter damages model accuracy. Neither of them could simultaneously satisfy the real-time feedback and accurate personalization requirements of recommender systems. In this paper, we proposed federated masked matrix factorization (FedMMF) to protect the data privacy in federated recommender systems without sacrificing efficiency and effectiveness. In more details, we introduce the new idea of personalized mask generated only from local data and apply it in FedMMF. On the one hand, personalized mask offers protection for participants' private data without effectiveness loss. On the other hand, combined with the adaptive secure aggregation protocol, personalized mask could further improve efficiency. Theoretically, we provide security analysis for personalized mask. Empirically, we also show the superiority of the designed model on different real-world data sets.

preprint2022arXiv

Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

With the enactment of privacy-preserving regulations, e.g., GDPR, federated SVD is proposed to enable SVD-based applications over different data sources without revealing the original data. However, many SVD-based applications cannot be well supported by existing federated SVD solutions. The crux is that these solutions, adopting either differential privacy (DP) or homomorphic encryption (HE), suffer from accuracy loss caused by unremovable noise or degraded efficiency due to inflated data. In this paper, we propose FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. At the heart of FedSVD is a lossless matrix masking scheme delicately designed for SVD: 1) While adopting the masks to protect private data, FedSVD completely removes them from the final results of SVD to achieve lossless accuracy; and 2) As the masks do not inflate the data, FedSVD avoids extra computation and communication overhead during the factorization to maintain high efficiency. Experiments with real-world datasets show that FedSVD is over 10000 times faster than the HE-based method and has 10 orders of magnitude smaller error than the DP-based solution on SVD tasks. We further build and evaluate FedSVD over three real-world applications: principal components analysis (PCA), linear regression (LR), and latent semantic analysis (LSA), to show its superior performance in practice. On federated LR tasks, compared with two state-of-the-art solutions: FATE and SecureML, FedSVD-LR is 100 times faster than SecureML and 10 times faster than FATE.

preprint2022arXiv

Rare Gems: Finding Lottery Tickets at Initialization

Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle et al. and Su et al. presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to $19\times$ faster.

preprint2022arXiv

Secure Forward Aggregation for Vertical Federated Neural Networks

Vertical federated learning (VFL) is attracting much attention because it enables cross-silo data cooperation in a privacy-preserving manner. While most research works in VFL focus on linear and tree models, deep models (e.g., neural networks) are not well studied in VFL. In this paper, we focus on SplitNN, a well-known neural network framework in VFL, and identify a trade-off between data security and model performance in SplitNN. Briefly, SplitNN trains the model by exchanging gradients and transformed data. On the one hand, SplitNN suffers from the loss of model performance since multiply parties jointly train the model using transformed data instead of raw data, and a large amount of low-level feature information is discarded. On the other hand, a naive solution of increasing the model performance through aggregating at lower layers in SplitNN (i.e., the data is less transformed and more low-level feature is preserved) makes raw data vulnerable to inference attacks. To mitigate the above trade-off, we propose a new neural network protocol in VFL called Security Forward Aggregation (SFA). It changes the way of aggregating the transformed data and adopts removable masks to protect the raw data. Experiment results show that networks with SFA achieve both data security and high model performance.

preprint2022arXiv

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

In this work, we formulate a visual dialog as an information flow in which each piece of information is encoded with the joint visual-linguistic representation of a single dialog round. Based on this formulation, we consider the visual dialog task as a sequence problem consisting of ordered visual-linguistic vectors. For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation and data efficiencies. For inference, we propose two Sequential Dialog Networks (SeqDialN): the first uses LSTM for information propagation (IP) and the second uses a modified Transformer for multi-step reasoning (MR). Our architecture separates the complexity of multimodal feature fusion from that of inference, which allows simpler design of the inference engine. IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance. MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog task. On VisDial v1.0 test-std dataset, our best single generative SeqDialN achieves 62.54% NDCG and 48.63% MRR; our ensemble generative SeqDialN achieves 63.78% NDCG and 49.98% MRR, which set a new state-of-the-art generative visual dialog model. We fine-tune discriminative SeqDialN with dense annotations and boost the performance up to 72.41% NDCG and 55.11% MRR. In this work, we discuss the extensive experiments we have conducted to demonstrate the effectiveness of our model components. We also provide visualization for the reasoning process from the relevant conversation rounds and discuss our fine-tuning methods. Our code is available at https://github.com/xiaoxiaoheimei/SeqDialN

preprint2021arXiv

Flow-based Generative Models for Learning Manifold to Manifold Mappings

Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there are also no modality transfer/translation models for manifold-valued data whereas numerous such methods based on generative models are available for natural images. This paper addresses this gap, motivated by a need in brain imaging -- in doing so, we expand the operating range of certain generative models (as well as generative models for modality transfer) from natural images to images with manifold-valued measurements. Our main result is the design of a two-stream version of GLOW (flow-based invertible generative models) that can synthesize information of a field of one type of manifold-valued measurements given another. On the theoretical side, we introduce three kinds of invertible layers for manifold-valued data, which are not only analogous to their functionality in flow-based generative models (e.g., GLOW) but also preserve the key benefits (determinants of the Jacobian are easy to calculate). For experiments, on a large dataset from the Human Connectome Project (HCP), we show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions (ODF) from diffusion tensor images (DTI), where the latter has a $5\times$ faster acquisition time but at the expense of worse angular resolution.

preprint2021arXiv

Learning Functional Priors and Posteriors from Data and Physics

We develop a new Bayesian framework based on deep neural networks to be able to extrapolate in space-time using historical data and to quantify uncertainties arising from both noisy and gappy data in physical problems. Specifically, the proposed approach has two stages: (1) prior learning and (2) posterior estimation. At the first stage, we employ the physics-informed Generative Adversarial Networks (PI-GAN) to learn a functional prior either from a prescribed function distribution, e.g., Gaussian process, or from historical data and physics. At the second stage, we employ the Hamiltonian Monte Carlo (HMC) method to estimate the posterior in the latent space of PI-GANs. In addition, we use two different approaches to encode the physics: (1) automatic differentiation, used in the physics-informed neural networks (PINNs) for scenarios with explicitly known partial differential equations (PDEs), and (2) operator regression using the deep operator network (DeepONet) for PDE-agnostic scenarios. We then test the proposed method for (1) meta-learning for one-dimensional regression, and forward/inverse PDE problems (combined with PINNs); (2) PDE-agnostic physical problems (combined with DeepONet), e.g., fractional diffusion as well as saturated stochastic (100-dimensional) flows in heterogeneous porous media; and (3) spatial-temporal regression problems, i.e., inference of a marine riser displacement field. The results demonstrate that the proposed approach can provide accurate predictions as well as uncertainty quantification given very limited scattered and noisy data, since historical data could be available to provide informative priors. In summary, the proposed method is capable of learning flexible functional priors, and can be extended to big data problems using stochastic HMC or normalizing flows since the latent space is generally characterized as low dimensional.

preprint2021arXiv

Measure-conditional Discriminator with Stationary Optimum for GANs and Statistical Distance Surrogates

We propose a simple but effective modification of the discriminators, namely measure-conditional discriminators, as a plug-and-play module for different GANs. By taking the generated distributions as part of input so that the target optimum for the discriminator is stationary, the proposed discriminator is more robust than the vanilla one. A variant of the measure-conditional discriminator can also handle multiple target distributions, or act as a surrogate model of statistical distances such as KL divergence with applications to transfer learning.

preprint2021arXiv

Robust propagation of internal coastal Kelvin waves in complex domains

We experimentally investigate internal coastal Kelvin waves in a two-layer fluid system on a rotating table. Waves in our system propagate in the prograde direction and are exponentially localized near the boundary. Our experiments verify the theoretical dispersion relation of the wave and show that the wave amplitude decays exponentially along the propagation direction. We further demonstrate that the waves can robustly propagate along boundaries of complex geometries without being scattered and that adding obstacles to the wave propagation path does not cause additional attenuation.

preprint2021arXiv

Weakly-Supervised Open-Retrieval Conversational Question Answering

Recent studies on Question Answering (QA) and Conversational QA (ConvQA) emphasize the role of retrieval: a system first retrieves evidence from a large collection and then extracts answers. This open-retrieval ConvQA setting typically assumes that each question is answerable by a single span of text within a particular passage (a span answer). The supervision signal is thus derived from whether or not the system can recover an exact match of this ground-truth answer span from the retrieved passages. This method is referred to as span-match weak supervision. However, information-seeking conversations are challenging for this span-match method since long answers, especially freeform answers, are not necessarily strict spans of any passage. Therefore, we introduce a learned weak supervision approach that can identify a paraphrased span of the known answer in a passage. Our experiments on QuAC and CoQA datasets show that the span-match weak supervisor can only handle conversations with span answers, and has less satisfactory results for freeform answers generated by people. Our method is more flexible as it can handle both span answers and freeform answers. Moreover, our method can be more powerful when combined with the span-match method which shows it is complementary to the span-match method. We also conduct in-depth analyses to show more insights on open-retrieval ConvQA under a weak supervision setting.

preprint2020arXiv

APTER: Aggregated Prognosis Through Exponential Reweighting

This paper considers the task of learning how to make a prognosis of a patient based on his/her micro-array expression levels. The method is an application of the aggregation method as recently proposed in the literature on theoretical machine learning, and excels in its computational convenience and capability to deal with high-dimensional data. A formal analysis of the method is given, yielding rates of convergence similar to what traditional techniques obtain, while it is shown to cope well with an exponentially large set of features. Those results are supported by numerical simulations on a range of publicly available survival-micro-array datasets. It is empirically found that the proposed technique combined with a recently proposed preprocessing technique gives excellent performances.

preprint2020arXiv

IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems

Personal assistant systems, such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana, are becoming ever more widely used. Understanding user intent such as clarification questions, potential answers and user feedback in information-seeking conversations is critical for retrieving good responses. In this paper, we analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART", which refers to "Intent-Aware Ranking with Transformers". IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture, which relies entirely on a self-attention mechanism instead of recurrent nets. It incorporates intent-aware utterance attention to derive an importance weighting scheme of utterances in conversation context with the aim of better conversation history understanding. We conduct extensive experiments with three information-seeking conversation data sets including both standard benchmarks and commercial data. Our proposed model outperforms all baseline methods with respect to a variety of metrics. We also perform case studies and analysis of learned user intent and its impact on response ranking in information-seeking conversations to provide interpretation of results.

preprint2020arXiv

Match$^2$: A Matching over Matching Model for Similar Question Identification

Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose a two-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match$^2$, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.

preprint2020arXiv

Open-Retrieval Conversational Question Answering

Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental role of retrieval in conversational search. To address this limitation, we introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems. We create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers. Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial for ORConvQA. We further show that our system can make a substantial improvement when we enable history modeling in all system components. Moreover, we show that the reranker component contributes to the model performance by providing a regularization effect. Finally, further in-depth analyses are performed to provide new insights into ORConvQA.

preprint2020arXiv

Particle Filtering Under General Regime Switching

In this paper, we consider a new framework for particle filtering under model uncertainty that operates beyond the scope of Markovian switching systems. Specifically, we develop a novel particle filtering algorithm that applies to general regime switching systems, where the model index is augmented as an unknown time-varying parameter in the system. The proposed approach does not require the use of multiple filters and can maintain a diverse set of particles for each considered model through appropriate choice of the particle filtering proposal distribution. The flexibility of the proposed approach allows for long-term dependencies between the models, which enables its use to a wider variety of real-world applications. We validate the method on a synthetic data experiment and show that it outperforms state-of-the-art multiple model particle filtering approaches that require the use of multiple filters.

preprint2020arXiv

Reinforcement Learning for Active Flow Control in Experiments

We demonstrate experimentally the feasibility of applying reinforcement learning (RL) in flow control problems by automatically discovering active control strategies without any prior knowledge of the flow physics. We consider the turbulent flow past a circular cylinder with the aim of reducing the cylinder drag force or maximizing the power gain efficiency by properly selecting the rotational speed of two small diameter cylinders, parallel to and located downstream of the larger cylinder. Given properly designed rewards and noise reduction techniques, after tens of towing experiments, the RL agent could discover the optimal control strategy, comparable to the optimal static control. While RL has been found to be effective in recent computer flow simulation studies, this is the first time that its effectiveness is demonstrated experimentally, paving the way for exploring new optimal active flow control strategies in complex fluid mechanics applications.

preprint2020arXiv

Solving Inverse Stochastic Problems from Discrete Particle Observations Using the Fokker-Planck Equation and Physics-informed Neural Networks

The Fokker-Planck (FP) equation governing the evolution of the probability density function (PDF) is applicable to many disciplines but it requires specification of the coefficients for each case, which can be functions of space-time and not just constants, hence requiring the development of a data-driven modeling approach. When the data available is directly on the PDF, then there exist methods for inverse problems that can be employed to infer the coefficients and thus determine the FP equation and subsequently obtain its solution. Herein, we address a more realistic scenario, where only sparse data are given on the particles' positions at a few time instants, which are not sufficient to accurately construct directly the PDF even at those times from existing methods, e.g., kernel estimation algorithms. To this end, we develop a general framework based on physics-informed neural networks (PINNs) that introduces a new loss function using the Kullback-Leibler divergence to connect the stochastic samples with the FP equation, to simultaneously learn the equation and infer the multi-dimensional PDF at all times. In particular, we consider two types of inverse problems, type I where the FP equation is known but the initial PDF is unknown, and type II in which, in addition to unknown initial PDF, the drift and diffusion terms are also unknown. In both cases, we investigate problems with either Brownian or Levy noise or a combination of both. We demonstrate the new PINN framework in detail in the one-dimensional case (1D) but we also provide results for up to 5D demonstrating that we can infer both the FP equation and} dynamics simultaneously at all times with high accuracy using only very few discrete observations of the particles.

preprint2020arXiv

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our memory efficient Sinkhorn Attention method is competitive with vanilla attention and consistently outperforms recently proposed efficient Transformer models such as Sparse Transformers.

preprint2019arXiv

Reflection Phase Shift of One-dimensional Plasmon Polaritons in Carbon Nanotubes

We investigated, both experimentally and theoretically, the reflection phase shift (RPS) of one-dimensional plasmon polaritons. We launched 1D plasmon polaritons in carbon nanotube and probed the plasmon interference pattern using scanning near-field optical microscopy (SNOM) technique, through which a non-zero phase shift was observed. We further developed a theory to understand the nonzero phase shift of 1D polaritons, and found that the RPS can be understood by considering the evanescent field beyond the nanotube end. Interesting, our theory shows a strong dependence of RPS on polaritons wavelength and nanotube diameter, which is in stark contrast to 2D plasmon polaritons in graphene where the RPS is a constant. In short wave region, the RPS of 1D polaritons only depends on a dimensionless variable -- the ratio between polaritons wavelength and nanotube diameter. These results provide fundamental insights into the reflection of polaritons in 1D system, and could facilitate the design of ultrasmall 1D polaritonic devices, such as resonators, interferometers.

preprint2017arXiv

Entangling light field with mechanical resonator at high temperature

We present a study on how to realize the widely interested optomechanical entanglement at high temperature. Unlike the majority of the previous experimental and theoretical researches that consider the entanglement of a mechanical resonator with a cavity field created by red-detuned continuous-wave or blue-detuned pulsed driving field, we find that applying blue-detuned continuous-wave pump field to cavity optomechanical systems can achieve considerable degrees of quantum entanglement, which is generally challenging to obtain at high temperature for the known physical systems. The competition between the induced squeezing-type interaction and the existing decoherence leads to stable entanglement in dynamically unstable regime. There is a much more relaxed condition for the existence of entanglement, as compared with the well-known criterion for neglecting the thermal decoherence on optomechanically coupled systems. A simple relation about a boundary in the parameter space, across which the entanglement can exist or not, is found with an analytical expression for the degree of the achieved entanglement at any temperature, which is derived for the systems of highly resolved sideband. The studied scenario with blue-detuned continuous-wave driving field can greatly simplify the generation of the widely interested optomechanical entanglement of macroscopic quantum states. Our study also provides the answers to two fundamentally meaningful open problems: (1) what is the condition for a system to avoid its loss of quantum entanglement under thermal decoherence? (2) is it possible to preserve the entanglement in a thermal environment by increasing the interaction that entangles the subsystems?

preprint2017arXiv

Parity-time-symmetric optical lattice with alternating gain and loss atomic configurations

Since the spatially extended periodic parity-time (PT) symmetric potential can possess certain unique properties compared to a single PT cell (with only a pair of coupled gain-loss components), various schemes have been proposed to realize periodic PT-symmetric potentials based on optical lattices. Here, we experimentally construct a spatially periodic PT-symmetric optical potential based on gain-loss arrays induced in a coherently-prepared atomic medium. The gain and loss arrays are generated in alternating four-level N-type and three-level $Λ$-type configurations in the same atomic medium, respectively, which do not require discrete diffractions as demonstrated in the previous work [Phys. Rev. Lett. 117, 123601(2016)] and can be easier to realize with more relaxed operating conditions. The dynamical behaviors of the system are investigated by measuring the phase difference between two adjacent gain and loss channels. The demonstrated PT-symmetric optical lattice with easy accessibility and better tunability sets a new stage for further exploiting the peculiar physical properties in periodic non-Hermitian systems.