Source author record

Liu Yang

Liu Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

62works

43topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Evolutionary Ensemble of Agents

We introduce Evolutionary Ensemble (EvE), a decentralized framework that organizes existing, highly capable coding agents into a live, co-evolving system for algorithmic discovery. Rather than reinventing the wheel within the "LLMs as optimizers" paradigm, EvE fixes the base agent substrate and focuses entirely on evolving the cumulative guidance and skills that dictate agent behaviors. By maintaining two co-evolving populations, namely functional code solvers and agent guidance states, the system evaluates agents through a synchronous race, updating their empirical Elo ratings based on the marginal gains they contribute to the current solver state. When applied to a research bottleneck in In-Context Operator Networks (ICON), EvE autonomously discovered a robust rescale-then-interpolate mechanism that enables reliable example-count generalization. Crucially, controlled ablations reveal the absolute necessity of stage-dependent agent adaptation to navigate the shifting search landscapes of complex codebases. Compared to variants driven by a fixed initial agent or even a frozen "best-evolved" agent, EvE uniquely avoids phase mismatch, demonstrating that organizing agents into a self-revising ensemble is the fundamental driver for breaking through static performance ceilings.

preprint2026arXiv

One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning

In Large Language Model (LLM) fine-tuning, parameter and data selection are common strategies for reducing fine-tuning cost, yet they are typically driven by separate scoring mechanisms. When a parameter mask and data subset jointly determine restricted fine-tuning, this separation incurs redundant overhead and makes coordinated selection difficult. We cast parameter and data selection as two bilevel selection problems under a common validation objective and derive a shared local response-surrogate scoring rule. Under first- and second-order validation-improvement approximations, parameter importance and data utility emerge as column-wise and row-wise aggregations of a single gradient interaction matrix, yielding a closed-form row-column correspondence for co-extracting both signals. Building on this structure, we propose DualSFT (Dual-Selection Fine-Tuning), a one-shot dual-scoring algorithm that produces a parameter mask and data subset from shared gradient statistics. On 3B-9B LLMs, single-axis DualSFT variants strengthen target-task performance and stability-plasticity trade-offs within their comparison groups, while full DualSFT yields a more favorable joint-constrained trade-off than sequential hybrid baselines under matched budgets.

preprint2026arXiv

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

preprint2025arXiv

A Tale of Two Shocks

Energetic particles in interplanetary space are normally measured at time scales that are long compared to the ion gyroperiod. Such observations by necessity average out the microphysics associated with the acceleration and transport of 10s - 100s keV particles. We investigate previously unseen non-equilibrium features that only become observable at very high time resolution, and discuss possible explanations of these features. We use unprecedentedly high-time-resolution data that were acquired by the in situ instruments on Solar Orbiter in the vicinity of two interplanetary shocks observed on 2023-11-29 07:51:17 UTC and 2023-11-30 10:47:26 UTC at $\sim 0.83$ astronomical units from the Sun. The solar-wind proton beam population follows the magnetic field instantaneously, on time scales which are significantly shorter than a gyro-period. Energetic particles, despite sampling large volumes of space, vary on remarkably short time scales, typically on the order of the convection time of their gyro-radius. Non-equilibrium features such as bump-on-tail distributions of energetic particles are formed by small-scale magnetic structures in the IMF. High-time-resolution observations show previously unobserved microphysics in the vicinity of two traveling interplanetary shocks, including ion reflection at a current sheet, which may explain where ions are reflected in shock acceleration.

preprint2025arXiv

Interface-Controlled Antiferromagnetic Tunnel Junctions based on a metallic van der Waals A-type Antiferromagnet

Magnetic tunnel junctions (MTJs) are crucial components in high-performance spintronic devices. Traditional MTJs rely on ferromagnetic (FM) materials but significant improvements in speed and packing density could be enabled by exploiting antiferromagnetic (AFM) compounds instead. Here, we report all-collinear AFM tunnel junctions (AFMTJs) fabricated with van der Waals A-type AFM metal (Fe0.6Co0.4)5GeTe2 (FCGT) electrodes and nonmagnetic semiconducting WSe2 tunnel barriers. The AFMTJ heterostructure device achieves a tunneling magnetoresistance (TMR) ratio of up to 75% in response to magnetic field switching. Our results demonstrate that the TMR exclusively emerges in the AFM state of FCGT, rather than during the AFM-to-FM transition. By engineering FCGT electrodes with either even- or odd-layer configurations, volatile or non-volatile TMR could be selected, consistent with an entirely interfacial effect. TMR in the even-layer devices arose by Néel vector switching. In the odd-layer devices, TMR stemmed from interfacial spin-flipping. Experimental and theoretical analyses reveal a new TMR mechanism associated with interface-driven spin-polarized transport, despite the spin-independent nature of bulk FCGT. Our work demonstrates that collinear AFMTJs can provide comparable performance to conventional MTJs and introduces a new paradigm for AFM spintronics, in which the spin-dependent properties of AFM interfaces are harnessed.

preprint2024arXiv

Nonvolatile optical control of interlayer stacking order in 1T-TaS2

Nonvolatile optical manipulation of material properties on demand is a highly sought-after feature in the advancement of future optoelectronic applications. While the discovery of such metastable transition in various materials holds good promise for achieving this goal, their practical implementation is still in the nascent stage. Here, we unravel the nature of the ultrafast laser-induced hidden state in 1T-TaS2 by systematically characterizing the electronic structure evolution throughout the reversible transition cycle. We identify it as a mixed-stacking state involving two similarly low-energy interlayer orders, which is manifested as the charge density wave phase disruption. Furthermore, our comparative experiments utilizing the single-pulse writing, pulse-train erasing and pulse-pair control explicitly reveal the distinct mechanism of the bidirectional transformations -- the ultrafast formation of the hidden state is initiated by a coherent phonon which triggers a competition of interlayer stacking orders, while its recovery to the initial state is governed by the progressive domain coarsening. Our work highlights the deterministic role of the competing interlayer orders in the nonvolatile phase transition in the layered material 1T-TaS2, and promises the coherent control of the phase transition and switching speed. More importantly, these results establish all-optical engineering of stacking orders in low-dimensional materials as a viable strategy for achieving desirable nonvolatile electronic devices.

preprint2022arXiv

A Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Industrial Wireless Sensor Networks

Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And then a density based outlier detection mechanism is introduced to acquire an adaptive trust threshold used to isolate the malicious nodes from being cluster heads. Finally, a fuzzy based cluster heads election method is proposed to achieve a balance between energy saving and security assurance, so that a normal sensor node with more residual energy or less confidence on other nodes has higher probability to be the cluster head. Extensive experiments verify that our secure clustering protocol can effectively defend the network against attacks from internal malicious or compromised nodes.

preprint2022arXiv

An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluation method is presented to transform the transmission evidences into trust values while effectively alleviating the trust uncertainty. And then, a K-Means based outlier detection scheme is proposed to further analyze plenty of trust values obtained via fuzzy trust evaluation or trust recommendation. It can discover the commonalities and differences among sensor nodes while improving the accuracy of outlier detection. Finally, we present an evolutionary game based secure clustering protocol to achieve a trade-off between security assurance and energy saving for sensor nodes when electing for the cluster heads. A sensor node which failed to be the cluster head can securely choose its own head by isolating the suspicious nodes. Simulation results verify that our secure clustering protocol can effectively defend the network against the attacks from internal selfish or compromised nodes. Correspondingly, the timely data transfer rate can be improved significantly.

preprint2022arXiv

An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things

Edge enabled Industrial Internet of Things (IIoT) platform is of great significance to accelerate the development of smart industry. However, with the dramatic increase in real-time IIoT applications, it is a great challenge to support fast response time, low latency, and efficient bandwidth utilization. To address this issue, Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. To the best of our knowledge, the combinability of multiple flows, which can significantly affect the scheduling performance, has never been systematically analyzed before. In this article, we first analyze the combinability problem. Then a non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Moreover, to improve bandwidth utilization, a dynamic queue scheduling (DQS) method is presented for the best-effort flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.

preprint2022arXiv

An Intelligent Trust Cloud Management Method for Secure Clustering in 5G enabled Internet of Medical Things

5G edge computing enabled Internet of Medical Things (IoMT) is an efficient technology to provide decentralized medical services while Device-to-device (D2D) communication is a promising paradigm for future 5G networks. To assure secure and reliable communication in 5G edge computing and D2D enabled IoMT systems, this paper presents an intelligent trust cloud management method. Firstly, an active training mechanism is proposed to construct the standard trust clouds. Secondly, individual trust clouds of the IoMT devices can be established through fuzzy trust inferring and recommending. Thirdly, a trust classification scheme is proposed to determine whether an IoMT device is malicious. Finally, a trust cloud update mechanism is presented to make the proposed trust management method adaptive and intelligent under an open wireless medium. Simulation results demonstrate that the proposed method can effectively address the trust uncertainty issue and improve the detection accuracy of malicious devices.

preprint2022arXiv

Downwash-aware Control Allocation for Over-actuated UAV Platforms

Tracking position and orientation independently affords more agile maneuver for over-actuated multirotor Unmanned Aerial Vehicles (UAVs) while introducing undesired downwash effects; downwash flows generated by thrust generators may counteract others due to close proximity, which significantly threatens the stability of the platform. The complexity of modeling aerodynamic airflow challenges control algorithms from properly compensating for such a side effect. Leveraging the input redundancies in over-actuated UAVs, we tackle this issue with a novel control allocation framework that considers downwash effects and explores the entire allocation space for an optimal solution. This optimal solution avoids downwash effects while providing high thrust efficiency within the hardware constraints. To the best of our knowledge, ours is the first formal derivation to investigate the downwash effects on over-actuated UAVs. We verify our framework on different hardware configurations in both simulation and experiment.

preprint2022arXiv

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks caused by either static topology heterogeneity or dynamic bandwidth contentions. Existing solutions, either system-level optimizations strengthening BSP (e.g., Ring or Hierarchical All-reduce) or algorithmic optimizations replacing BSP (e.g., ASP or SSP, which relax the global barriers), do not completely solve the problem, as they may still suffer from communication inefficiency or risk convergence inaccuracy. In this paper, we present a novel divide-and-shuffle synchronization (DS-Sync) to realize communication efficiency without sacrificing convergence accuracy for distributed DNN training. At its heart, by taking into account the network bottlenecks, DS-Sync improves communication efficiency by dividing workers into non-overlap groups to synchronize independently in a bottleneck-free manner. Meanwhile, it maintains convergence accuracy by iteratively shuffling workers among different groups to ensure a global consensus. We theoretically prove that DS-Sync converges properly in non-convex and smooth conditions like DNN. We further implement DS-Sync and integrate it with PyTorch, and our testbed experiments show that DS-Sync can achieve up to $94\%$ improvements on the end-to-end training time with existing solutions while maintaining the same accuracy.

preprint2022arXiv

Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks

Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and reliable services. In this article, a generative adversarial learning-enabled trust management method is presented for 6G wireless networks. Some typical AI-based trust management schemes are first reviewed, and then a potential heterogeneous and intelligent 6G architecture is introduced. Next, the integration of AI and trust management is developed to optimize the intelligence and security. Finally, the presented AI-based trust management method is applied to secure clustering to achieve reliable and real-time communications. Simulation results have demonstrated its excellent performance in guaranteeing network security and service quality.

preprint2022arXiv

High Spatial and Temporal Resolution NIR-IIb Gastrointestinal Imaging in Mice

Conventional biomedical imaging modalities, including endoscopy, X-rays, and magnetic resonance, are invasive and cannot provide sufficient spatial and temporal resolutions for regular imaging of gastrointestinal (GI) tract to guide prognosis and therapy of GI diseases. Here we report a non-invasive method for optical imaging of GI tract. It is based on a new type of lanthanide-doped nanocrystal with near-infrared (NIR) excitation at 980 nm and second NIR window (NIR-IIb) (1500~1700 nm) fluorescence emission at around 1530 nm. The rational design and controlled synthesis of nanocrystals with high brightness have led to an absolute quantum yield (QY) up to 48.6%. Further benefitting from the minimized scattering through the NIR-IIb window, we enhanced the spatial resolution by 3 times compared with the other NIR-IIa (1000~1500 nm) contract agents for GI tract imaging. The approach also led to a high temporal resolution of 8 frames per second, so that the moment of mice intestinal peristalsis happened in one minute can be captured. Furthermore, with a light-sheet imaging system, we demonstrated a three-dimensional (3D) imaging of the stereoscopic structure of the GI tract. Moreover, we successfully translate these advances to diagnose inflammatory bowel disease (IBD) in a pre-clinical model of mice colitis.

preprint2022arXiv

Inflation-induced aneurysm formation and evolution in graded cylindrical tubes of arbitrary thickness

We study the initiation and evolution of aneurysmal morphology in a pressurized soft tube where the elastic modulus is non-uniform in the radial direction. The primary deformation prior to instability is characterized within the framework of nonlinear elasticity for a general material constitution and a generic modulus gradient. To unravel the influence of modulus gradient on aneurysm formation, we employ the incompressible Gent model and select three representative modulus gradients, including a linear, an exponential, and a sinusoidal function. In particular, the sinusoidal distribution can be used to model actual artery structure. In addition, two prototypical loading conditions are considered, namely, either the resultant axial force or the axial length can be fixed. Based on an explicit bifurcation condition in terms of the internal pressure and the resultant axial force for aneurysm formation or localized bulging, an exhaustive theoretical analysis on bulge initiation is carried out and the effect of geometric and material parameters and modulus gradient on the critical stretch generating localized bulging is revealed. It turns out that the modulus mismatch, as well as the position of maximum modulus, can dramatically affect the onset of localized bulging. Then we analytically elucidate the influence of modulus gradient on bulge propagation and conduct a finite element analysis of bulge evolution based on a robust finite element model established in Abaqus by UHYPER subroutine coding. Interestingly, it is found that a sinusoidally distributed modulus has negligible influence on the critical stretch of bulge initiation, the deformation process of bugle growth, and the maximum size of a bulge. The current analysis can provide useful insight into the biological evolution of human artery and into localized instabilities in graded structures.

preprint2022arXiv

Phonon-mediated superconductivity in two-dimensional hydrogenated phosphorus carbide: HPC$_{3}$

In the recent years, three-dimensional (3D) high-temperature superconductors at ultrahigh pressure have been reported, typical examples are the polyhydrides H$_{3}$S, LaH$_{10}$, and YH$_{9}$, etc. To find high-temperature superconductors in two-dimensional (2D) at atmosphere pressure is another research hotspot. Here, we investigated the possible superconductivity in a hydrogenated monolayer phosphorus carbide based on first-principles calculations. The results reveal that monolayer PC$_{3}$ transforms from a semiconductor to a metal after hydrogenation. Interestingly, the C-$π$-bonding band contributes most to the states at the Fermi level. Based on the electron-phonon coupling mechanism, it is found that the electron-phonon coupling constant of HPC$_{3}$ is 0.95, which mainly origins from the coupling of C-$π$ electrons with the in-plane vibration modes of C and H. The calculated critical temperature $T_{c}$ is 31.0 K, which is higher than most of the 2D superconductors. By further applying biaxial tensile strain of 3$\%$, the $T_{c}$ can be boosted to 57.3 K, exceeding the McMillan limit. Thus, hydrogenation and strain are effective ways for increasing the superconducting $T_{c}$ of 2D materials.

preprint2022arXiv

Practical and Secure Federated Recommendation with Personalized Masks

Federated recommendation addresses the data silo and privacy problems altogether for recommender systems. Current federated recommender systems mainly utilize cryptographic or obfuscation methods to protect the original ratings from leakage. However, the former comes with extra communication and computation costs, and the latter damages model accuracy. Neither of them could simultaneously satisfy the real-time feedback and accurate personalization requirements of recommender systems. In this paper, we proposed federated masked matrix factorization (FedMMF) to protect the data privacy in federated recommender systems without sacrificing efficiency and effectiveness. In more details, we introduce the new idea of personalized mask generated only from local data and apply it in FedMMF. On the one hand, personalized mask offers protection for participants' private data without effectiveness loss. On the other hand, combined with the adaptive secure aggregation protocol, personalized mask could further improve efficiency. Theoretically, we provide security analysis for personalized mask. Empirically, we also show the superiority of the designed model on different real-world data sets.

preprint2022arXiv

Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

With the enactment of privacy-preserving regulations, e.g., GDPR, federated SVD is proposed to enable SVD-based applications over different data sources without revealing the original data. However, many SVD-based applications cannot be well supported by existing federated SVD solutions. The crux is that these solutions, adopting either differential privacy (DP) or homomorphic encryption (HE), suffer from accuracy loss caused by unremovable noise or degraded efficiency due to inflated data. In this paper, we propose FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. At the heart of FedSVD is a lossless matrix masking scheme delicately designed for SVD: 1) While adopting the masks to protect private data, FedSVD completely removes them from the final results of SVD to achieve lossless accuracy; and 2) As the masks do not inflate the data, FedSVD avoids extra computation and communication overhead during the factorization to maintain high efficiency. Experiments with real-world datasets show that FedSVD is over 10000 times faster than the HE-based method and has 10 orders of magnitude smaller error than the DP-based solution on SVD tasks. We further build and evaluate FedSVD over three real-world applications: principal components analysis (PCA), linear regression (LR), and latent semantic analysis (LSA), to show its superior performance in practice. On federated LR tasks, compared with two state-of-the-art solutions: FATE and SecureML, FedSVD-LR is 100 times faster than SecureML and 10 times faster than FATE.

preprint2022arXiv

Rare Gems: Finding Lottery Tickets at Initialization

Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle et al. and Su et al. presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to $19\times$ faster.

preprint2022arXiv

Secure Forward Aggregation for Vertical Federated Neural Networks

Vertical federated learning (VFL) is attracting much attention because it enables cross-silo data cooperation in a privacy-preserving manner. While most research works in VFL focus on linear and tree models, deep models (e.g., neural networks) are not well studied in VFL. In this paper, we focus on SplitNN, a well-known neural network framework in VFL, and identify a trade-off between data security and model performance in SplitNN. Briefly, SplitNN trains the model by exchanging gradients and transformed data. On the one hand, SplitNN suffers from the loss of model performance since multiply parties jointly train the model using transformed data instead of raw data, and a large amount of low-level feature information is discarded. On the other hand, a naive solution of increasing the model performance through aggregating at lower layers in SplitNN (i.e., the data is less transformed and more low-level feature is preserved) makes raw data vulnerable to inference attacks. To mitigate the above trade-off, we propose a new neural network protocol in VFL called Security Forward Aggregation (SFA). It changes the way of aggregating the transformed data and adopts removable masks to protect the raw data. Experiment results show that networks with SFA achieve both data security and high model performance.

preprint2022arXiv

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

In this work, we formulate a visual dialog as an information flow in which each piece of information is encoded with the joint visual-linguistic representation of a single dialog round. Based on this formulation, we consider the visual dialog task as a sequence problem consisting of ordered visual-linguistic vectors. For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation and data efficiencies. For inference, we propose two Sequential Dialog Networks (SeqDialN): the first uses LSTM for information propagation (IP) and the second uses a modified Transformer for multi-step reasoning (MR). Our architecture separates the complexity of multimodal feature fusion from that of inference, which allows simpler design of the inference engine. IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance. MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog task. On VisDial v1.0 test-std dataset, our best single generative SeqDialN achieves 62.54% NDCG and 48.63% MRR; our ensemble generative SeqDialN achieves 63.78% NDCG and 49.98% MRR, which set a new state-of-the-art generative visual dialog model. We fine-tune discriminative SeqDialN with dense annotations and boost the performance up to 72.41% NDCG and 55.11% MRR. In this work, we discuss the extensive experiments we have conducted to demonstrate the effectiveness of our model components. We also provide visualization for the reasoning process from the relevant conversation rounds and discuss our fine-tuning methods. Our code is available at https://github.com/xiaoxiaoheimei/SeqDialN

preprint2021arXiv

Flow-based Generative Models for Learning Manifold to Manifold Mappings

Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there are also no modality transfer/translation models for manifold-valued data whereas numerous such methods based on generative models are available for natural images. This paper addresses this gap, motivated by a need in brain imaging -- in doing so, we expand the operating range of certain generative models (as well as generative models for modality transfer) from natural images to images with manifold-valued measurements. Our main result is the design of a two-stream version of GLOW (flow-based invertible generative models) that can synthesize information of a field of one type of manifold-valued measurements given another. On the theoretical side, we introduce three kinds of invertible layers for manifold-valued data, which are not only analogous to their functionality in flow-based generative models (e.g., GLOW) but also preserve the key benefits (determinants of the Jacobian are easy to calculate). For experiments, on a large dataset from the Human Connectome Project (HCP), we show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions (ODF) from diffusion tensor images (DTI), where the latter has a $5\times$ faster acquisition time but at the expense of worse angular resolution.

preprint2021arXiv

Learning Functional Priors and Posteriors from Data and Physics

We develop a new Bayesian framework based on deep neural networks to be able to extrapolate in space-time using historical data and to quantify uncertainties arising from both noisy and gappy data in physical problems. Specifically, the proposed approach has two stages: (1) prior learning and (2) posterior estimation. At the first stage, we employ the physics-informed Generative Adversarial Networks (PI-GAN) to learn a functional prior either from a prescribed function distribution, e.g., Gaussian process, or from historical data and physics. At the second stage, we employ the Hamiltonian Monte Carlo (HMC) method to estimate the posterior in the latent space of PI-GANs. In addition, we use two different approaches to encode the physics: (1) automatic differentiation, used in the physics-informed neural networks (PINNs) for scenarios with explicitly known partial differential equations (PDEs), and (2) operator regression using the deep operator network (DeepONet) for PDE-agnostic scenarios. We then test the proposed method for (1) meta-learning for one-dimensional regression, and forward/inverse PDE problems (combined with PINNs); (2) PDE-agnostic physical problems (combined with DeepONet), e.g., fractional diffusion as well as saturated stochastic (100-dimensional) flows in heterogeneous porous media; and (3) spatial-temporal regression problems, i.e., inference of a marine riser displacement field. The results demonstrate that the proposed approach can provide accurate predictions as well as uncertainty quantification given very limited scattered and noisy data, since historical data could be available to provide informative priors. In summary, the proposed method is capable of learning flexible functional priors, and can be extended to big data problems using stochastic HMC or normalizing flows since the latent space is generally characterized as low dimensional.

preprint2021arXiv

Measure-conditional Discriminator with Stationary Optimum for GANs and Statistical Distance Surrogates

We propose a simple but effective modification of the discriminators, namely measure-conditional discriminators, as a plug-and-play module for different GANs. By taking the generated distributions as part of input so that the target optimum for the discriminator is stationary, the proposed discriminator is more robust than the vanilla one. A variant of the measure-conditional discriminator can also handle multiple target distributions, or act as a surrogate model of statistical distances such as KL divergence with applications to transfer learning.

preprint2021arXiv

Robust propagation of internal coastal Kelvin waves in complex domains

We experimentally investigate internal coastal Kelvin waves in a two-layer fluid system on a rotating table. Waves in our system propagate in the prograde direction and are exponentially localized near the boundary. Our experiments verify the theoretical dispersion relation of the wave and show that the wave amplitude decays exponentially along the propagation direction. We further demonstrate that the waves can robustly propagate along boundaries of complex geometries without being scattered and that adding obstacles to the wave propagation path does not cause additional attenuation.

preprint2021arXiv

Weakly-Supervised Open-Retrieval Conversational Question Answering

Recent studies on Question Answering (QA) and Conversational QA (ConvQA) emphasize the role of retrieval: a system first retrieves evidence from a large collection and then extracts answers. This open-retrieval ConvQA setting typically assumes that each question is answerable by a single span of text within a particular passage (a span answer). The supervision signal is thus derived from whether or not the system can recover an exact match of this ground-truth answer span from the retrieved passages. This method is referred to as span-match weak supervision. However, information-seeking conversations are challenging for this span-match method since long answers, especially freeform answers, are not necessarily strict spans of any passage. Therefore, we introduce a learned weak supervision approach that can identify a paraphrased span of the known answer in a passage. Our experiments on QuAC and CoQA datasets show that the span-match weak supervisor can only handle conversations with span answers, and has less satisfactory results for freeform answers generated by people. Our method is more flexible as it can handle both span answers and freeform answers. Moreover, our method can be more powerful when combined with the span-match method which shows it is complementary to the span-match method. We also conduct in-depth analyses to show more insights on open-retrieval ConvQA under a weak supervision setting.

preprint2020arXiv

APTER: Aggregated Prognosis Through Exponential Reweighting

This paper considers the task of learning how to make a prognosis of a patient based on his/her micro-array expression levels. The method is an application of the aggregation method as recently proposed in the literature on theoretical machine learning, and excels in its computational convenience and capability to deal with high-dimensional data. A formal analysis of the method is given, yielding rates of convergence similar to what traditional techniques obtain, while it is shown to cope well with an exponentially large set of features. Those results are supported by numerical simulations on a range of publicly available survival-micro-array datasets. It is empirically found that the proposed technique combined with a recently proposed preprocessing technique gives excellent performances.

preprint2020arXiv

IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems

Personal assistant systems, such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana, are becoming ever more widely used. Understanding user intent such as clarification questions, potential answers and user feedback in information-seeking conversations is critical for retrieving good responses. In this paper, we analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART", which refers to "Intent-Aware Ranking with Transformers". IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture, which relies entirely on a self-attention mechanism instead of recurrent nets. It incorporates intent-aware utterance attention to derive an importance weighting scheme of utterances in conversation context with the aim of better conversation history understanding. We conduct extensive experiments with three information-seeking conversation data sets including both standard benchmarks and commercial data. Our proposed model outperforms all baseline methods with respect to a variety of metrics. We also perform case studies and analysis of learned user intent and its impact on response ranking in information-seeking conversations to provide interpretation of results.

preprint2020arXiv

Match$^2$: A Matching over Matching Model for Similar Question Identification

Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose a two-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match$^2$, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.

preprint2020arXiv

Open-Retrieval Conversational Question Answering

Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental role of retrieval in conversational search. To address this limitation, we introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems. We create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers. Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial for ORConvQA. We further show that our system can make a substantial improvement when we enable history modeling in all system components. Moreover, we show that the reranker component contributes to the model performance by providing a regularization effect. Finally, further in-depth analyses are performed to provide new insights into ORConvQA.

preprint2020arXiv

Particle Filtering Under General Regime Switching

In this paper, we consider a new framework for particle filtering under model uncertainty that operates beyond the scope of Markovian switching systems. Specifically, we develop a novel particle filtering algorithm that applies to general regime switching systems, where the model index is augmented as an unknown time-varying parameter in the system. The proposed approach does not require the use of multiple filters and can maintain a diverse set of particles for each considered model through appropriate choice of the particle filtering proposal distribution. The flexibility of the proposed approach allows for long-term dependencies between the models, which enables its use to a wider variety of real-world applications. We validate the method on a synthetic data experiment and show that it outperforms state-of-the-art multiple model particle filtering approaches that require the use of multiple filters.

preprint2020arXiv

Reinforcement Learning for Active Flow Control in Experiments

We demonstrate experimentally the feasibility of applying reinforcement learning (RL) in flow control problems by automatically discovering active control strategies without any prior knowledge of the flow physics. We consider the turbulent flow past a circular cylinder with the aim of reducing the cylinder drag force or maximizing the power gain efficiency by properly selecting the rotational speed of two small diameter cylinders, parallel to and located downstream of the larger cylinder. Given properly designed rewards and noise reduction techniques, after tens of towing experiments, the RL agent could discover the optimal control strategy, comparable to the optimal static control. While RL has been found to be effective in recent computer flow simulation studies, this is the first time that its effectiveness is demonstrated experimentally, paving the way for exploring new optimal active flow control strategies in complex fluid mechanics applications.

preprint2020arXiv

Solving Inverse Stochastic Problems from Discrete Particle Observations Using the Fokker-Planck Equation and Physics-informed Neural Networks

The Fokker-Planck (FP) equation governing the evolution of the probability density function (PDF) is applicable to many disciplines but it requires specification of the coefficients for each case, which can be functions of space-time and not just constants, hence requiring the development of a data-driven modeling approach. When the data available is directly on the PDF, then there exist methods for inverse problems that can be employed to infer the coefficients and thus determine the FP equation and subsequently obtain its solution. Herein, we address a more realistic scenario, where only sparse data are given on the particles' positions at a few time instants, which are not sufficient to accurately construct directly the PDF even at those times from existing methods, e.g., kernel estimation algorithms. To this end, we develop a general framework based on physics-informed neural networks (PINNs) that introduces a new loss function using the Kullback-Leibler divergence to connect the stochastic samples with the FP equation, to simultaneously learn the equation and infer the multi-dimensional PDF at all times. In particular, we consider two types of inverse problems, type I where the FP equation is known but the initial PDF is unknown, and type II in which, in addition to unknown initial PDF, the drift and diffusion terms are also unknown. In both cases, we investigate problems with either Brownian or Levy noise or a combination of both. We demonstrate the new PINN framework in detail in the one-dimensional case (1D) but we also provide results for up to 5D demonstrating that we can infer both the FP equation and} dynamics simultaneously at all times with high accuracy using only very few discrete observations of the particles.

preprint2020arXiv

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our memory efficient Sinkhorn Attention method is competitive with vanilla attention and consistently outperforms recently proposed efficient Transformer models such as Sparse Transformers.

preprint2019arXiv

Reflection Phase Shift of One-dimensional Plasmon Polaritons in Carbon Nanotubes

We investigated, both experimentally and theoretically, the reflection phase shift (RPS) of one-dimensional plasmon polaritons. We launched 1D plasmon polaritons in carbon nanotube and probed the plasmon interference pattern using scanning near-field optical microscopy (SNOM) technique, through which a non-zero phase shift was observed. We further developed a theory to understand the nonzero phase shift of 1D polaritons, and found that the RPS can be understood by considering the evanescent field beyond the nanotube end. Interesting, our theory shows a strong dependence of RPS on polaritons wavelength and nanotube diameter, which is in stark contrast to 2D plasmon polaritons in graphene where the RPS is a constant. In short wave region, the RPS of 1D polaritons only depends on a dimensionless variable -- the ratio between polaritons wavelength and nanotube diameter. These results provide fundamental insights into the reflection of polaritons in 1D system, and could facilitate the design of ultrasmall 1D polaritonic devices, such as resonators, interferometers.

preprint2017arXiv

Entangling light field with mechanical resonator at high temperature

We present a study on how to realize the widely interested optomechanical entanglement at high temperature. Unlike the majority of the previous experimental and theoretical researches that consider the entanglement of a mechanical resonator with a cavity field created by red-detuned continuous-wave or blue-detuned pulsed driving field, we find that applying blue-detuned continuous-wave pump field to cavity optomechanical systems can achieve considerable degrees of quantum entanglement, which is generally challenging to obtain at high temperature for the known physical systems. The competition between the induced squeezing-type interaction and the existing decoherence leads to stable entanglement in dynamically unstable regime. There is a much more relaxed condition for the existence of entanglement, as compared with the well-known criterion for neglecting the thermal decoherence on optomechanically coupled systems. A simple relation about a boundary in the parameter space, across which the entanglement can exist or not, is found with an analytical expression for the degree of the achieved entanglement at any temperature, which is derived for the systems of highly resolved sideband. The studied scenario with blue-detuned continuous-wave driving field can greatly simplify the generation of the widely interested optomechanical entanglement of macroscopic quantum states. Our study also provides the answers to two fundamentally meaningful open problems: (1) what is the condition for a system to avoid its loss of quantum entanglement under thermal decoherence? (2) is it possible to preserve the entanglement in a thermal environment by increasing the interaction that entangles the subsystems?

preprint2017arXiv

Parity-time-symmetric optical lattice with alternating gain and loss atomic configurations

Since the spatially extended periodic parity-time (PT) symmetric potential can possess certain unique properties compared to a single PT cell (with only a pair of coupled gain-loss components), various schemes have been proposed to realize periodic PT-symmetric potentials based on optical lattices. Here, we experimentally construct a spatially periodic PT-symmetric optical potential based on gain-loss arrays induced in a coherently-prepared atomic medium. The gain and loss arrays are generated in alternating four-level N-type and three-level $Λ$-type configurations in the same atomic medium, respectively, which do not require discrete diffractions as demonstrated in the previous work [Phys. Rev. Lett. 117, 123601(2016)] and can be easier to realize with more relaxed operating conditions. The dynamical behaviors of the system are investigated by measuring the phase difference between two adjacent gain and loss channels. The demonstrated PT-symmetric optical lattice with easy accessibility and better tunability sets a new stage for further exploiting the peculiar physical properties in periodic non-Hermitian systems.

preprint2016arXiv

Adaptive and Scalable Android Malware Detection through Online Learning

It is well-known that malware constantly evolves so as to evade detection and this causes the entire malware population to be non-stationary. Contrary to this fact, prior works on machine learning based Android malware detection have assumed that the distribution of the observed malware characteristics (i.e., features) do not change over time. In this work, we address the problem of malware population drift and propose a novel online machine learning based framework, named DroidOL to handle it and effectively detect malware. In order to perform accurate detection, security-sensitive behaviors are captured from apps in the form of inter-procedural control-flow sub-graph features using a state-of-the-art graph kernel. In order to perform scalable detection and to adapt to the drift and evolution in malware population, an online passive-aggressive classifier is used. In a large-scale comparative analysis with more than 87,000 apps, DroidOL achieves 84.29% accuracy outperforming two state-of-the-art malware techniques by more than 20% in their typical batch learning setting and more than 3% when they are continuously re-trained. Our experimental findings strongly indicate that online learning based approaches are highly suitable for real-world malware detection.

preprint2016arXiv

Contextual Weisfeiler-Lehman Graph Kernel For Malware Detection

In this paper, we propose a novel graph kernel specifically to address a challenging problem in the field of cyber-security, namely, malware detection. Previous research has revealed the following: (1) Graph representations of programs are ideally suited for malware detection as they are robust against several attacks, (2) Besides capturing topological neighbourhoods (i.e., structural information) from these graphs it is important to capture the context under which the neighbourhoods are reachable to accurately detect malicious neighbourhoods. We observe that state-of-the-art graph kernels, such as Weisfeiler-Lehman kernel (WLK) capture the structural information well but fail to capture contextual information. To address this, we develop the Contextual Weisfeiler-Lehman kernel (CWLK) which is capable of capturing both these types of information. We show that for the malware detection problem, CWLK is more expressive and hence more accurate than WLK while maintaining comparable efficiency. Through our large-scale experiments with more than 50,000 real-world Android apps, we demonstrate that CWLK outperforms two state-of-the-art graph kernels (including WLK) and three malware detection techniques by more than 5.27% and 4.87% F-measure, respectively, while maintaining high efficiency. This high accuracy and efficiency make CWLK suitable for large-scale real-world malware detection.

preprint2016arXiv

Dynamic matrix factorization with social influence

Matrix factorization is a key component of collaborative filtering-based recommendation systems because it allows us to complete sparse user-by-item ratings matrices under a low-rank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for understanding and inferring the preferences of people. First, peoples' preferences and their observable manifestations as ratings evolve over time along general patterns of trajectories. Second, an individual person's preferences evolve over time through influence of their social connections. In this paper, we develop a unified process model for both types of dynamics within a state space approach, together with an efficient optimization scheme for estimation within that model. The model combines elements from recent developments in dynamic matrix factorization, opinion dynamics and social learning, and trust-based recommendation. The estimation builds upon recent advances in numerical nonlinear optimization. Empirical results on a large-scale data set from the Epinions website demonstrate consistent reduction in root mean squared error by consideration of the two types of dynamics.

preprint2016arXiv

Dynamical Phonon Laser in Coupled Active-Passive Microresonators

Effective transition between the population-inverted optical eigenmodes of two coupled microcavities carrying mechanical oscillation realizes a phonon analogue of optical two-level laser. By providing an approach that linearizes the dynamical equations of weak nonlinear systems without relying on their steady states, we study such phonon laser action as a realistic dynamical process, which exhibits time-dependent stimulated phonon field amplification especially when one of the cavities is added with optical gain medium. The approach we present explicitly gives the conditions for the optimum phonon lasing, and thermal noise is found to be capable of facilitating the phonon laser action significantly.

preprint2016arXiv

Interacting photon pulses in Rydberg medium

The understanding of dynamical evolutions of interacting photon pulses in Rydberg atomic ensemble is the prerequisite for realizing quantum devices with such system. We present an approach that efficiently simulates the dynamical processes, using a set of local functions we construct to reflect the profiles of narrowband pulses. For two counter-propagating photon pulses, our approach predicts the distinct phenomena from the widely concerned Rydberg blockade to the previously less noticed significant absorption in the anomalous dispersion regime, which can occur by respectively setting the pulse frequency to the appropriate values. Our numerical simulations also demonstrate how spatially extending photon pulses become deformed under realistic non-uniform interaction over their distributions.

preprint2016arXiv

Observation of Parity-Time Symmetry in Optically Induced Atomic Lattices

We experimentally demonstrate PT-symmetric optical lattices with periodical gain and loss profiles in a coherently-prepared four-level N-type atomic system. By appropriately tuning the pertinent atomic parameters, the onset of PT-symmetry breaking is observed through measuring an abrupt phase-shift jump between adjacent gain and loss waveguides. The experimental realization of such readily reconfigurable and effectively controllable PT-symmetric waveguide array structure sets a new stage for further exploiting and better understanding the peculiar physical properties of these non-Hermitian systems in atomic settings.

preprint2015arXiv

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

We study the optimal rates of convergence for estimating a prior distribution over a VC class from a sequence of independent data sets respectively labeled by independent target functions sampled from the prior. We specifically derive upper and lower bounds on the optimal rates under a smoothness condition on the correct prior, with the number of samples per data set equal the VC dimension. These results have implications for the improvements achievable via transfer learning. We additionally extend this setting to real-valued function, where we establish consistency of an estimator for the prior, and discuss an additional application to a preference elicitation problem in algorithmic economics.

preprint2015arXiv

Combining upper limits with a Bayesian approach

We discuss how to determine and combine upper limits based on observed events and estimated backgrounds with a Bayesian method, when insignificant signals are observed in independent measurements. In addition to some general features deduced from the analytical formulae, systematic numerical results are obtained by a C$++$ program (CULBA) for low-count experiments, which can be used as a reference to combine two upper limits.

preprint2015arXiv

Cyclic permutation-time symmetric structure with coupled gain-loss microcavities

We study the coupled even number of microcavities with the balanced gain and loss between any pair of their neighboring components. The effective non-Hermitian Hamiltonian for such structure has the cyclic permutation-time symmetry with respect to the cavity modes, and this symmetry determines the patterns of the dynamical evolutions of the cavity modes. The systems also have multiple exceptional points for the degeneracy of the existing supermodes, exhibiting the "phase transition" of system dynamics across these exceptional points. We illustrate the quantum dynamical properties of the systems with the evolutions of cavity photon numbers and correlation functions. Moreover, we demonstrate the effects of the quantum noises accompanying the amplification and dissipation of the cavity modes. The reciprocal light transportation predicted with the effective non-Hermitian models for the similar couplers is violated by the unavoidable quantum noises.

preprint2015arXiv

Learning with a Drifting Target Concept

We study the problem of learning in the presence of a drifting target concept. Specifically, we provide bounds on the error rate at a given time, given a learner with access to a history of independent samples labeled according to a target concept that can change on each round. One of our main contributions is a refinement of the best previous results for polynomial-time algorithms for the space of linear separators under a uniform distribution. We also provide general results for an algorithm capable of adapting to a variable rate of drift of the target concept. Some of the results also describe an active learning variant of this setting, and provide bounds on the number of queries for the labels of points in the sequence sufficient to obtain the stated bounds on the error rates.

preprint2015arXiv

Nonuniform $(h,k,μ,ν)$-dichotomy and Stability of Nonautonomous Discrete Dynamics

In this paper, a new notion called the general nonuniform $(h,k,μ,ν)$-dichotomy for a sequence of linear operators is proposed, which occurs in a more natural way and is related to nonuniform hyperbolicity. Then, sufficient criteria are established for the existence of nonuniform $(h,k,μ,ν)$-dichotomy in terms of appropriate Lyapunov exponents for the sequence of linear operators. Moreover, we investigate the stability theory of sequences of non uniformly hyperbolic linear operators in Banach spaces, which admit a nonuniform $(h,k,μ,ν)$-dichotomy. In the case of linear perturbations, we investigate parameter dependence of robustness or roughness of the nonuniform $(h,k,μ,ν)$-dichotomies and show that the stable and unstable subspaces of nonuniform $(h,k,μ,ν)$-dichotomies for the linear perturbed system are Lipschitz continuous for the parameters. In the case of nonlinear perturbations, we construct a new version of the Grobman-Hartman theorem and explore the existence of parameter dependence of stable Lipschitz invariant manifolds when the nonlinear perturbation is of Lipschitz type.

preprint2015arXiv

Nonuniform $(h,k,μ,ν)$-Dichotomy with Applications to Nonautonomous Dynamical Systems

The paper develops and studies a very general notion of dichotomy, referred to as "nonuniform $(h,k,μ,ν)$-dichotomy". The new notion contains as special cases most versions of dichotomy existing in the literature. The paper then provides corresponding new versions of robustness, Hartman-Grobman theorem, and stable manifold theorem for nonautonomous dynamical systems in Banach spaces in term of the nonuniform $(h,k,μ,ν)$-dichotomy.

preprint2015arXiv

Storage and retrieval of photons under their mutual interaction in Rydberg medium

Stopping and regenerating a pair of single-photon pulses at adjacent locations in coherently prepared Rydberg atomic ensembles are significantly affected by their effective interaction mediated by Rydberg excitations, and the similar processes can differ notably from the one exhibiting the common Rydberg blockade as with the stationary propagation of multi-photon light beams in the same medium. Based on the complete dynamics, we reveal the detailed features in such processes by finding how the profiles of the involved quantum fields evolve in various situations. The findings help to determine the proper regimes for implementing photonic gates and transistors. In addition, we discuss the non-adiabatic corrections associated with quickly changing control fields, and illustrate a method that restores the photon pulses' original amplitude during their retrieval unless they are heavily damped before storage.

preprint2014arXiv

Minimax Analysis of Active Learning

This work establishes distribution-free upper and lower bounds on the minimax label complexity of active learning with general hypothesis classes, under various noise models. The results reveal a number of surprising facts. In particular, under the noise model of Tsybakov (2004), the minimax label complexity of active learning with a VC class is always asymptotically smaller than that of passive learning, and is typically significantly smaller than the best previously-published upper bounds in the active learning literature. In high-noise regimes, it turns out that all active learning problems of a given VC dimension have roughly the same minimax label complexity, which contrasts with well-known results for bounded noise. In low-noise regimes, we find that the label complexity is well-characterized by a simple combinatorial complexity measure we call the star number. Interestingly, we find that almost all of the complexity measures previously explored in the active learning literature have worst-case values exactly equal to the star number. We also propose new active learning strategies that nearly achieve these minimax label complexities.

preprint2014arXiv

Nanowire Spin Torque Oscillator Driven by Spin Orbit Torques

Spin torque from spin current applied to a nanoscale region of a ferromagnet can act as negative magnetic damping and thereby excite self-oscillations of its magnetization. In contrast, spin torque uniformly applied to the magnetization of an extended ferromagnetic film does not generate self-oscillatory magnetic dynamics but leads to reduction of the saturation magnetization. Here we report studies of the effect of spin torque on a system of intermediate dimensionality - a ferromagnetic nanowire. We observe coherent self-oscillations of magnetization in a ferromagnetic nanowire serving as the active region of a spin torque oscillator driven by spin orbit torques. Our work demonstrates that magnetization self-oscillations can be excited in a one-dimensional magnetic system and that dimensions of the active region of spin torque oscillators can be extended beyond the nanometer length scale.

preprint2014arXiv

Plasmonic and Metamaterial Structures as Electromagnetic Absorbers

Electromagnetic absorbers have drawn increasing attention in many areas. A series of plasmonic and metamaterial structures can work as efficient narrow band absorbers due to the excitation of plasmonic or photonic resonances, providing a great potential for applications in designing selective thermal emitters, bio-sensing, etc. In other applications such as solar energy harvesting and photonic detection, the bandwidth of light absorbers is required to be quite broad. Under such a background, a variety of mechanisms of broadband/multiband absorption have been proposed, such as mixing multiple resonances together, exciting phase resonances, slowing down light by anisotropic metamaterials, employing high loss materials and so on.

preprint2013arXiv

A Polarization-insensitive and high-efficiency Schottky Photodetector integrated with a silicon ridge waveguide

We propose a polarization-insensitive and high-efficiency plasmonic silicon Schottky diode for detection of sub-bandgap photons in the optical communication wavelength range through internal photoemission. Our photodiode is based on a hybrid plasmonic silicon waveguide. It has a gold film covering both the top and the sidewalls of a dielectric silicon waveguide with the Schottky contact formed at the gold-silicon interface. An extensive physical model is presented in detail and applied to calculate and analyze the performance of our detector. By comparison with a diode with only top contact of gold, the polarization sensitivity of responsivity is greatly minimized in our photodetector with sidewall coverage of gold. Much higher responsivities for both polarizations are also achieved in a very broad wavelength range of 1.2-1.5 μm. Moreover, the Schottky contact is only 4 μm long, leading to a very small dark current. Our design is very promising for practical applications in high-density silicon photonic integration.

preprint2013arXiv

A variant of Hörmander's $L^2$ existence theorem for Dirac operator in Clifford analysis

In this paper, we give the Hörmander's $L^2$ theorem for Dirac operator over an open subset $Ω\in\R^{n+1}$ with Clifford algebra. Some sufficient condition on the existence of the weak solutions for Dirac operator has been found in the sense of Clifford analysis. In particular, if $Ω$ is bounded, then we prove that for any $f$ in $L^2$ space with value in Clifford algebra, there exists a weak solution of Dirac operator such that $$\bar{D}u=f$$ with $u$ in the $L^2$ space as well. The method is based on Hörmander's $L^2$ existence theorem in complex analysis and the $L^2$ weighted space is utilised.

preprint2013arXiv

AIR: fused Analytical and Iterative Reconstruction method for computed tomography

Purpose: CT image reconstruction techniques have two major categories: analytical reconstruction (AR) method and iterative reconstruction (IR) method. AR reconstructs images through analytical formulas, such as filtered backprojection (FBP) in 2D and Feldkamp-Davis-Kress (FDK) method in 3D, which can be either mathematically exact or approximate. On the other hand, IR is often based on the discrete forward model of X-ray transform and formulated as a minimization problem with some appropriate image regularization method, so that the reconstructed image corresponds to the minimizer of the optimization problem. This work is to investigate the fused analytical and iterative reconstruction (AIR) method. Methods: Based on IR with L1-type image regularization, AIR is formulated with a AR-specific preconditioner in the data fidelity term, which results in the minimal change of the solution algorithm that replaces the adjoint X-ray transform by the filtered X-ray transform. As a proof-of-concept 2D example of AIR, FBP is incorporated into tensor framelet (TF) regularization based IR, and the formulated AIR minimization problem is then solved through split Bregman method with GPU-accelerated X-ray transform and filtered adjoint X-ray transform. Conclusion: AIR, the fused Analytical and Iterative Reconstruction method, is proposed with a proof-of-concept 2D example to synergize FBP and TF-regularized IR, with improved image resolution and contrast for experimental data. The potential impact of AIR is that it offers a general framework to develop various AR enhanced IR methods, when neither AR nor IR alone is sufficient.

preprint2013arXiv

An inverse problem of identifying the radiative coefficient in a degenerate parabolic equation

This work investigates an inverse problem of determining the radiative coefficient in a degenerate parabolic equation from the final overspecified data. Being different from other inverse coefficient problems in which the principle coefficients are assumed to be strictly positive definite, the mathematical model discussed in the paper belongs to the second order parabolic equations with non-negative characteristic form, namely that there exists degeneracy on the lateral boundaries of the domain. The uniqueness of the solution is obtained by the contraction mapping principle. Based on the optimal control framework, the problem is transformed into an optimization problem and the existence of the minimizer is established. After the necessary conditions which must be satisfied by the minimizer are deduced, the uniqueness and stability of the minimizer are proved. By minor modification of the cost functional and some \emph{a-priori} regularity conditions imposed on the forward operator, the convergence of the minimizer for the noisy input data is obtained in the paper. The results obtained in the paper are interesting and useful, and can be extended to more general degenerate parabolic equations.

preprint2013arXiv

Towards a System Theoretic Approach to Wireless Network Capacity in Finite Time and Space

In asymptotic regimes, both in time and space (network size), the derivation of network capacity results is grossly simplified by brushing aside queueing behavior in non-Jackson networks. This simplifying double-limit model, however, lends itself to conservative numerical results in finite regimes. To properly account for queueing behavior beyond a simple calculus based on average rates, we advocate a system theoretic methodology for the capacity problem in finite time and space regimes. This methodology also accounts for spatial correlations arising in networks with CSMA/CA scheduling and it delivers rigorous closed-form capacity results in terms of probability distributions. Unlike numerous existing asymptotic results, subject to anecdotal practical concerns, our transient one can be used in practical settings: for example, to compute the time scales at which multi-hop routing is more advantageous than single-hop routing.

preprint2012arXiv

Active Property Testing

One of the motivations for property testing of boolean functions is the idea that testing can serve as a preprocessing step before learning. However, in most machine learning applications, it is not possible to request for labels of fictitious examples constructed by the algorithm. Instead, the dominant query paradigm in applied machine learning, called active learning, is one where the algorithm may query for labels, but only on points in a given polynomial-sized (unlabeled) sample, drawn from some underlying distribution D. In this work, we bring this well-studied model in learning to the domain of testing. We show that for a number of important properties, testing can still yield substantial benefits in this setting. This includes testing unions of intervals, testing linear separators, and testing various assumptions used in semi-supervised learning. In addition to these specific results, we also develop a general notion of the testing dimension of a given property with respect to a given distribution. We show this dimension characterizes (up to constant factors) the intrinsic number of label requests needed to test that property. We develop such notions for both the active and passive testing models. We then use these dimensions to prove a number of lower bounds, including for linear separators and the class of dictator functions. Our results show that testing can be a powerful tool in realistic models for learning, and further that active testing exhibits an interesting and rich structure. Our work in addition brings together tools from a range of areas including U-statistics, noise-sensitivity, self-correction, and spectral analysis of random matrices, and develops new tools that may be of independent interest.

preprint2012arXiv

Bayesian Active Distance Metric Learning

Distance metric learning is an important component for many tasks, such as statistical classification and content-based image retrieval. Existing approaches for learning distance metrics from pairwise constraints typically suffer from two major problems. First, most algorithms only offer point estimation of the distance metric and can therefore be unreliable when the number of training examples is small. Second, since these algorithms generally select their training examples at random, they can be inefficient if labeling effort is limited. This paper presents a Bayesian framework for distance metric learning that estimates a posterior distribution for the distance metric from labeled pairwise constraints. We describe an efficient algorithm based on the variational method for the proposed Bayesian approach. Furthermore, we apply the proposed Bayesian framework to active distance metric learning by selecting those unlabeled example pairs with the greatest uncertainty in relative distance. Experiments in classification demonstrate that the proposed framework achieves higher classification accuracy and identifies more informative training examples than the non-Bayesian approach and state-of-the-art distance metric learning algorithms.

preprint2001arXiv

Current induced forces upon atoms adsorbed on conducting carbon nanotubes

We calculate the forces acting upon species adsorbed on a single wall carbon nanotube, in the presence of electric currents. We present a self consistent real space Green function method, which enables us to calculate the current induced forces from an ab-initio Hamiltonian. The method is applied to calculate the force on an adsorbed O atom on a (5,5) carbon nanotube, for different bias voltages and adsorption sites. For good contact regimes and biases of the order of Volts, the presence of a current can affect the potential energy surfaces considerably. Implications of these effects for the induced diffusion of the species are analyzed. The dependence of the force with the nanotube radius is studied. In addition, the magnitude of inelastic electron scattering, inducing vibrational heating, and its influence on the adsorbates' drift, is commented.

preprint2001arXiv

Resonant versus anti-resonant tunneling at carbon nanotube A-B-A heterostructures

Narrow antiresonances going to zero transmission are found to occur for general (2n,0)(n,n)(2n,0) carbon nanotube heterostructures, whereas the complementary configuration, (n,n)(2n,0)(n,n), displays simple resonant tunneling behaviour. We compute examples for different cases, and give a simple explanation for the appearance of antiresonances in one case but not in the other. Conditions and ranges for the occurence of these different behaviors are stated. The phenomenon of anti-resonant tunneling, which has passed unnoticed in previous studies of nanotube heterostructures, adds up to the rich set of behaviors available to nanotube based quantum effect devices.

Liu Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

62 published item(s)

Evolutionary Ensemble of Agents

One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

A Tale of Two Shocks

Interface-Controlled Antiferromagnetic Tunnel Junctions based on a metallic van der Waals A-type Antiferromagnet

Nonvolatile optical control of interlayer stacking order in 1T-TaS2

A Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Industrial Wireless Sensor Networks

An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things

An Intelligent Trust Cloud Management Method for Secure Clustering in 5G enabled Internet of Medical Things

Downwash-aware Control Allocation for Over-actuated UAV Platforms

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks

High Spatial and Temporal Resolution NIR-IIb Gastrointestinal Imaging in Mice

Inflation-induced aneurysm formation and evolution in graded cylindrical tubes of arbitrary thickness

Phonon-mediated superconductivity in two-dimensional hydrogenated phosphorus carbide: HPC$_{3}$

Practical and Secure Federated Recommendation with Personalized Masks

Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

Rare Gems: Finding Lottery Tickets at Initialization

Secure Forward Aggregation for Vertical Federated Neural Networks

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

Flow-based Generative Models for Learning Manifold to Manifold Mappings

Learning Functional Priors and Posteriors from Data and Physics

Measure-conditional Discriminator with Stationary Optimum for GANs and Statistical Distance Surrogates

Robust propagation of internal coastal Kelvin waves in complex domains

Weakly-Supervised Open-Retrieval Conversational Question Answering

APTER: Aggregated Prognosis Through Exponential Reweighting

IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems

Match$^2$: A Matching over Matching Model for Similar Question Identification

Open-Retrieval Conversational Question Answering

Particle Filtering Under General Regime Switching

Reinforcement Learning for Active Flow Control in Experiments

Solving Inverse Stochastic Problems from Discrete Particle Observations Using the Fokker-Planck Equation and Physics-informed Neural Networks

Sparse Sinkhorn Attention

Reflection Phase Shift of One-dimensional Plasmon Polaritons in Carbon Nanotubes

Entangling light field with mechanical resonator at high temperature

Parity-time-symmetric optical lattice with alternating gain and loss atomic configurations

Adaptive and Scalable Android Malware Detection through Online Learning

Contextual Weisfeiler-Lehman Graph Kernel For Malware Detection

Dynamic matrix factorization with social influence

Dynamical Phonon Laser in Coupled Active-Passive Microresonators

Interacting photon pulses in Rydberg medium

Observation of Parity-Time Symmetry in Optically Induced Atomic Lattices

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Combining upper limits with a Bayesian approach

Cyclic permutation-time symmetric structure with coupled gain-loss microcavities

Learning with a Drifting Target Concept

Nonuniform $(h,k,μ,ν)$-dichotomy and Stability of Nonautonomous Discrete Dynamics

Nonuniform $(h,k,μ,ν)$-Dichotomy with Applications to Nonautonomous Dynamical Systems

Storage and retrieval of photons under their mutual interaction in Rydberg medium

Minimax Analysis of Active Learning

Nanowire Spin Torque Oscillator Driven by Spin Orbit Torques

Plasmonic and Metamaterial Structures as Electromagnetic Absorbers

A Polarization-insensitive and high-efficiency Schottky Photodetector integrated with a silicon ridge waveguide

A variant of Hörmander's $L^2$ existence theorem for Dirac operator in Clifford analysis

AIR: fused Analytical and Iterative Reconstruction method for computed tomography

An inverse problem of identifying the radiative coefficient in a degenerate parabolic equation

Towards a System Theoretic Approach to Wireless Network Capacity in Finite Time and Space

Active Property Testing

Bayesian Active Distance Metric Learning

Current induced forces upon atoms adsorbed on conducting carbon nanotubes

Resonant versus anti-resonant tunneling at carbon nanotube A-B-A heterostructures