Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
54works
0followers
33topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

54 published item(s)

preprint2026arXiv

CL-bench Life: Can Language Models Learn from Real-Life Context?

Today's AI assistants such as OpenClaw are designed to handle context effectively, making context learning an increasingly important capability for models. As these systems move beyond professional settings into everyday life, the nature of the contexts they must handle also shifts. Real-life contexts are often messy, fragmented, and deeply tied to personal and social experience, such as multi-party conversations, personal archives, and behavioral traces. Yet it remains unclear whether current frontier language models can reliably learn from such contexts and solve tasks grounded in them. To this end, we introduce CL-bench Life, a fully human-curated benchmark comprising 405 context-task pairs and 5,348 verification rubrics, covering common real-life scenarios. Solving tasks in CL-bench Life requires models to reason over complex, messy real-life contexts, calling for strong real-life context learning abilities that go far beyond those evaluated in existing benchmarks. We evaluate ten frontier LMs and find that real-life context learning remains highly challenging: even the best-performing model achieves only 19.3% task solving rate, while the average performance across models is only 13.8%. Models still struggle to reason over contexts such as messy group chat histories and fragmented behavioral records from everyday life. CL-bench Life provides a crucial testbed for advancing real-life context learning, and progress on it can enable more intelligent and reliable AI assistants in everyday life.

preprint2025arXiv

HY-MT1.5 Technical Report

In this report, we introduce our latest translation models, HY-MT1.5-1.8B and HY-MT1.5-7B, a new family of machine translation models developed through a holistic training framework tailored for high-performance translation. Our methodology orchestrates a multi-stage pipeline that integrates general and MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. HY-MT1.5-1.8B, the 1.8B-parameter model demonstrates remarkable parameter efficiency, comprehensively outperforming significantly larger open-source baselines (e.g., Tower-Plus-72B, Qwen3-32B) and mainstream commercial APIs (e.g., Microsoft Translator, Doubao Translator) in standard Chinese-foreign and English-foreign tasks. It achieves approximately 90% of the performance of ultra-large proprietary models such as Gemini-3.0-Pro, while marginally trailing Gemini-3.0-Pro on WMT25 and Mandarin-minority language benchmarks, it maintains a substantial lead over other competing models. Furthermore, HY-MT1.5-7B establishes a new state-of-the-art for its size class, achieving 95% of Gemini-3.0-Pro's performance on Flores-200 and surpassing it on the challenging WMT25 and Mandarin-minority language test sets. Beyond standard translation, the HY-MT1.5 series supports advanced constraints, including terminology intervention, context-aware translation, and format preservation. Extensive empirical evaluations confirm that both models offer highly competitive, robust solutions for general and specialized translation tasks within their respective parameter scales.

preprint2025arXiv

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Spurious bias, a tendency to exploit spurious correlations between superficial input attributes and prediction targets, has revealed a severe robustness pitfall in classical machine learning problems. Multimodal Large Language Models (MLLMs), which leverage pretrained vision and language models, have recently demonstrated strong capability in joint vision-language understanding. However, both the presence and severity of spurious biases in MLLMs remain poorly understood. In this work, we address this gap by analyzing the spurious biases in the multimodal setting and uncovering the specific inference-time data patterns that can manifest this problem. To support this analysis, we introduce MM-SpuBench, a comprehensive, human-verified benchmark dataset consisting of image-class pairs annotated with core and spurious attributes, grounded in our taxonomy of nine distinct types of spurious correlations. The benchmark is constructed using human-interpretable attribute information to capture a wide range of spurious patterns reflective of real-world knowledge. Leveraging this benchmark, we conduct a comprehensive evaluation of the state-of-the-art open-source and proprietary MLLMs with both standard accuracy and the proposed Conditional Generation Likelihood Advantage (CGLA). Our findings highlight the persistence of reliance on spurious correlations and the difficulty of mitigation on our benchmark. We hope this work can inspire new technical strides to mitigate these biases. Our benchmark is publicly available at https://huggingface.co/datasets/mmbench/MM-SpuBench.

preprint2024arXiv

A comprehensive framework for occluded human pose estimation

Occlusion presents a significant challenge in human pose estimation. The challenges posed by occlusion can be attributed to the following factors: 1) Data: The collection and annotation of occluded human pose samples are relatively challenging. 2) Feature: Occlusion can cause feature confusion due to the high similarity between the target person and interfering individuals. 3) Inference: Robust inference becomes challenging due to the loss of complete body structural information. The existing methods designed for occluded human pose estimation usually focus on addressing only one of these factors. In this paper, we propose a comprehensive framework DAG (Data, Attention, Graph) to address the performance degradation caused by occlusion. Specifically, we introduce the mask joints with instance paste data augmentation technique to simulate occlusion scenarios. Additionally, an Adaptive Discriminative Attention Module (ADAM) is proposed to effectively enhance the features of target individuals. Furthermore, we present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results. Through extensive experiments conducted on three benchmark datasets for occluded human pose estimation, we demonstrate that the proposed method outperforms existing methods. Code and data will be publicly available.

preprint2023arXiv

Entangling spins using cubic nonlinear dynamics

Entangled states with a large number of $N$ atomic spins are a key ingredient for quantum information processing and quantum metrology. Nowadays, the preparation of such states has mainly relied on the quadratic nonlinear dynamics. Here, we investigate the preparation of spin-spin multipartite entanglement, witnessed by quantum Fisher information, by using the cubic nonlinear dynamics. We find that, in the regime of weak coupling, the cubic scheme can greatly speed up the rate of entanglement generation as compared to the quadratic scheme (about $N$ times faster). In the strong coupling regime, the cubic nonlinear dynamics enables the periodic in time generation of a broad variety of new-type macroscopic superposition states, which allow us to realize near-Heisenberg-limit phase sensitivity. In addition, we also reveal an interesting feature that the amount of entanglement generated by the cubic scheme has a macroscopic sensitivity to the parity of $N$, which has no counterpart in quadratic nonlinear dynamics and can be exploited for sensing the parity of $N$ at the single-spin level. We also propose a new approach for a fast and high-fidelity generation of maximally entangled Greenberger-Horne-Zeilinger (GHZ) states. By using an alternative cubic-quadratic-admixture type of nonlinear interaction, we show that one may accelerate the procedure of GHZ-state generation. The realization of the cubic nonlinear dynamics is also considered, showing that the cubic nonlinear dynamics can be realized by either repeatedly using linear- and quadratic-nonlinear dynamics or utilizing light-mediated interactions in just one step. Finally, by taking realistic imperfections into account, we find that the cubic scheme is sensitivity to the single-spin decay in the strong coupling regime, while is robust against the collective dephasing.

preprint2023arXiv

High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization

In this paper, we propose a uniformly dithered 1-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to the estimation problems of sparse covariance matrix estimation, sparse linear regression (i.e., compressed sensing), and matrix completion. We study both sub-Gaussian and heavy-tailed regimes, where the underlying distribution of heavy-tailed data is assumed to have bounded moments of some order. We propose new estimators based on 1-bit quantized data. In sub-Gaussian regime, our estimators achieve near minimax rates, indicating that our quantization scheme costs very little. In heavy-tailed regime, while the rates of our estimators become essentially slower, these results are either the first ones in an 1-bit quantized and heavy-tailed setting, or already improve on existing comparable results from some respect. Under the observations in our setting, the rates are almost tight in compressed sensing and matrix completion. Our 1-bit compressed sensing results feature general sensing vector that is sub-Gaussian or even heavy-tailed. We also first investigate a novel setting where both the covariate and response are quantized. In addition, our approach to 1-bit matrix completion does not rely on likelihood and represent the first method robust to pre-quantization noise with unknown distribution. Experimental results on synthetic data are presented to support our theoretical analysis.

preprint2023arXiv

USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval

As a fundamental and challenging task in bridging language and vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality, and its key challenge is to measure the semantic similarity across different modalities. Although significant progress has been achieved, existing approaches typically suffer from two major limitations: (1) It hurts the accuracy of the representation by directly exploiting the bottom-up attention based region-level features where each region is equally treated. (2) It limits the scale of negative sample pairs by employing the mini-batch based end-to-end training mechanism. To address these limitations, we propose a Unified Semantic Enhancement Momentum Contrastive Learning (USER) method for ITR. Specifically, we delicately design two simple but effective Global representation based Semantic Enhancement (GSE) modules. One learns the global representation via the self-attention algorithm, noted as Self-Guided Enhancement (SGE) module. The other module benefits from the pre-trained CLIP module, which provides a novel scheme to exploit and transfer the knowledge from an off-the-shelf model, noted as CLIP-Guided Enhancement (CGE) module. Moreover, we incorporate the training mechanism of MoCo into ITR, in which two dynamic queues are employed to enrich and enlarge the scale of negative sample pairs. Meanwhile, a Unified Training Objective (UTO) is developed to learn from mini-batch based and dynamic queue based samples. Extensive experiments on the benchmark MSCOCO and Flickr30K datasets demonstrate the superiority of both retrieval accuracy and inference efficiency. Our source code will be released at https://github.com/zhangy0822/USER.

preprint2022arXiv

A Survey of Multi-Tenant Deep Learning Inference on GPU

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving throughput, reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technological advances, we hope that this survey could shed light on new optimization perspectives and motivate novel works in future large-scale DL system optimization.

preprint2022arXiv

Accelerated numerical algorithms for steady states of Gross-Pitaevskii equations coupled with microwaves

We present two accelerated numerical algorithms for single-component and binary Gross-Pitaevskii (GP) equations coupled with microwaves (electromagnetic fields) in steady state. One is based on a normalized gradient flow formulation, called the ASGF method, while the other on a perturbed, projected conjugate gradient approach for the nonlinear constrained optimization, called the PPNCG method. The coupled GP equations are nonlocal in space, describing pseudo-spinor Bose-Einstein condensates (BECs) interacting with an electromagnetic field. Our interest in this study is to develop efficient, iterative numerical methods for steady symmetric and central vortex states of the nonlocal GP equation systems. In the algorithms, the GP equations are discretized by a Legendre-Galerkin spectral method in a polar coordinate in two-dimensional (2D) space. The new algorithms are shown to outperform the existing ones through a host of benchmark examples, among which the PPNCG method performs the best. Additional numerical simulations of the central vortex states are provided to demonstrate the usefulness and efficiency of the new algorithms.

preprint2022arXiv

Broad Learning System with Takagi-Sugeno Fuzzy Subsystem for Tobacco Origin Identification based on Near Infrared Spectroscopy

Tobacco origin identification is significantly important in tobacco industry. Modeling analysis for sensor data with near infrared spectroscopy has become a popular method for rapid detection of internal features. However, for sensor data analysis using traditional artificial neural network or deep network models, the training process is extremely time-consuming. In this paper, a novel broad learning system with Takagi-Sugeno (TS) fuzzy subsystem is proposed for rapid identification of tobacco origin. Incremental learning is employed in the proposed method, which obtains the weight matrix of the network after a very small amount of computation, resulting in much shorter training time for the model, with only about 3 seconds for the extra step training. The experimental results show that the TS fuzzy subsystem can extract features from the near infrared data and effectively improve the recognition performance. The proposed method can achieve the highest prediction accuracy (95.59 %) in comparison to the traditional classification algorithms, artificial neural network, and deep convolutional neural network, and has a great advantage in the training time with only about 128 seconds.

preprint2022arXiv

C-P Map: A Novel Evaluation Toolkit for Speaker Verification

Evaluation trials are used to probe performance of automatic speaker verification (ASV) systems. In spite of the clear importance and impact, evaluation trials have not been seriously treated in research and engineering practice. This paper firstly presents a theoretical analysis on evaluation trials and highlights potential bias with the most popular cross-pairing approach used in trials design. To interpret and settle this problem, we define the concept of trial config and C-P map derived from it. The C-P map measures the performance of an ASV system on various trial configs in a 2-dimensional map. On the map, each location represents a particular trial config and its corresponding color represents the system performance. Experiments conducted on representative ASV systems show that the proposed C-P map offers a powerful evaluation toolkit for ASV performance analysis and comparison. The source code for C-P map has been release at https://gitlab.com/csltstu/sunine.

preprint2022arXiv

Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning

Continual Learning (CL) is an emerging machine learning paradigm that aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. To avoid performance decrease caused by forgetting, prior studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. Despite the promising results, since CL is often assumed to execute on mobile or IoT devices, the EM size is bounded by the small hardware memory capacity and makes it infeasible to meet the accuracy requirements for real-world applications. Specifically, all prior CL methods discard samples overflowed from the EM and can never retrieve them back for subsequent training steps, incurring loss of information that would exacerbate catastrophic forgetting. We explore a novel hierarchical EM management strategy to address the forgetting issue. In particular, in mobile and IoT devices, real-time data can be stored not just in high-speed RAMs but in internal storage devices as well, which offer significantly larger capacity than the RAMs. Based on this insight, we propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage without being interfered by the slow access speed of the storage. We call it Carousel Memory (CarM). As CarM is complementary to existing CL methods, we conduct extensive evaluations of our method with seven popular CL methods and show that CarM significantly improves the accuracy of the methods across different settings by large margins in final average accuracy (up to 28.4%) while retaining the same training efficiency.

preprint2022arXiv

Deep neural network heatmaps capture Alzheimer's disease patterns reported in a large meta-analysis of neuroimaging studies

Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer's disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been proposed to address this issue and analyze the imaging patterns extracted from the deep neural networks, but no quantitative comparison between these methods has been carried out so far. In this work, we explore these questions by deriving heatmaps from Convolutional Neural Networks (CNN) trained using T1 MRI scans of the ADNI data set, and by comparing these heatmaps with brain maps corresponding to Support Vector Machines (SVM) coefficients. Three prominent heatmap methods are studied: Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG), and Guided Grad-CAM (GGC). Contrary to prior studies where the quality of heatmaps was visually or qualitatively assessed, we obtained precise quantitative measures by computing overlap with a ground-truth map from a large meta-analysis that combined 77 voxel-based morphometry (VBM) studies independently from ADNI. Our results indicate that all three heatmap methods were able to capture brain regions covering the meta-analysis map and achieved better results than SVM coefficients. Among them, IG produced the heatmaps with the best overlap with the independent meta-analysis.

preprint2022arXiv

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

(Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. However, unlike in the non-private case, existing techniques are not yet able to provide finite sample statistical guarantees. To address this issue, we propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees. Moreover, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. For the other two models, we provide the first finite sample statistical guarantees. Our theory is supported by thorough numerical experiments.

preprint2022arXiv

Differentially Private $\ell_1$-norm Linear Regression with Heavy-tailed Data

We study the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) with heavy-tailed data. Specifically, we focus on the $\ell_1$-norm linear regression in the $ε$-DP model. While most of the previous work focuses on the case where the loss function is Lipschitz, here we only need to assume the variates has bounded moments. Firstly, we study the case where the $\ell_2$ norm of data has bounded second order moment. We propose an algorithm which is based on the exponential mechanism and show that it is possible to achieve an upper bound of $\tilde{O}(\sqrt{\frac{d}{nε}})$ (with high probability). Next, we relax the assumption to bounded $θ$-th order moment with some $θ\in (1, 2)$ and show that it is possible to achieve an upper bound of $\tilde{O}(({\frac{d}{nε}})^\frac{θ-1}θ)$. Our algorithms can also be extended to more relaxed cases where only each coordinate of the data has bounded moments, and we can get an upper bound of $\tilde{O}({\frac{d}{\sqrt{nε}}})$ and $\tilde{O}({\frac{d}{({nε})^\frac{θ-1}θ}})$ in the second and $θ$-th moment case respectively.

preprint2022arXiv

Direct observation of strong anomalous Hall effect and proximity-induced ferromagnetic state in SrIrO3

The 5d iridium-based transition metal oxides have gained broad interest because of their strong spin-orbit coupling which favors new or exotic quantum electronic states. On the other hand, they rarely exhibit more mainstream orders like ferromagnetism due to generally weak electron-electron correlation strength. Here, we show a proximity-induced ferromagnetic (FM) state with TC = 100 K and strong magnetocrystalline anisotropy in a SrIrO3 (SIO) heterostructure via interfacial charge transfer by using a ferromagnetic insulator in contact with SIO. Electrical transport allows to selectively probe the FM state of the SIO layer and the direct observation of a strong, intrinsic and positive anomalous Hall effect (AHE). For T < 20 K, the AHE displays unusually large coercive and saturation field, a fingerprint of a strong pseudospin-lattice coupling. A Hall angle, sxyAHE/sxx, larger by an order of magnitude than in typical 3d metals and a FM net moment of about 0.1 mB/Ir, is reported. This emphasizes how efficiently the nontrivial topological band properties of SIO can be manipulated by structural modifications and the exchange interaction with 3d TMOs.

preprint2022arXiv

Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein&#39;s lemma, we present an $(ε, δ)$-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $α$ (with high probability) is ${O}(p α^{-2})$ and $\tilde{O}(p^3α^{-2}ε^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $α^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein&#39;s lemma, we propose an $(ε, δ)$-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $α$ is ${O}(p^2α^{-2})$ and $\tilde{O}(p^2α^{-2}ε^{-2})$ respectively, under some mild assumptions and if $α$ is not too small ({\em i.e.,} $α\geq Ω(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets.

preprint2022arXiv

Faster Rates of Private Stochastic Convex Optimization

In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) and provide excess population risks for some special classes of functions that are faster than the previous results of general convex and strongly convex functions. In the first part of the paper, we study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $θ>1$. Specifically, we first show that under some mild assumptions on the loss functions, there is an algorithm whose output could achieve an upper bound of $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log \frac{1}δ}}{nε})^\fracθ{θ-1})$ for $(ε, δ)$-DP when $θ\geq 2$, here $n$ is the sample size and $d$ is the dimension of the space. Then we address the inefficiency issue, improve the upper bounds by $\text{Poly}(\log n)$ factors and extend to the case where $θ\geq \barθ>1$ for some known $\barθ$. Next we show that the excess population risk of population functions satisfying TNC with parameter $θ\geq 2$ is always lower bounded by $Ω((\frac{d}{nε})^\fracθ{θ-1}) $ and $Ω((\frac{\sqrt{d\log \frac{1}δ}}{nε})^\fracθ{θ-1})$ for $ε$-DP and $(ε, δ)$-DP, respectively. In the second part, we focus on a special case where the population risk function is strongly convex. Unlike the previous studies, here we assume the loss function is {\em non-negative} and {\em the optimal value of population risk is sufficiently small}. With these additional assumptions, we propose a new method whose output could achieve an upper bound of $O(\frac{d\log\frac{1}δ}{n^2ε^2}+\frac{1}{n^τ})$ for any $τ\geq 1$ in $(ε,δ)$-DP model if the sample size $n$ is sufficiently large.

preprint2022arXiv

From topological amplitude to rescattering dynamics

We proposed a theoretical framework to correlate the topological diagram at quark level and rescattering dynamics at hadron level. In the framework, both the hadronic triangle diagram, and the topological-scattering diagram, which is the intermediate structure between topological diagram and triangle diagram, are expressed in the tensor form. The completeness of topological-scattering diagram is confirmed by the quark substructure of meson-meson scattering. The coefficient of each triangle diagram can be derived from the topological-scattering diagram and the total rescattering amplitudes are consistent with the ones derived from the chiral Lagrangian. If only the short-distance $T$ diagram is considered as the weak vertex in triangle diagram, the rescattering contributions in the $C$, $E$ and $P$ diagrams have definite proportional relation of $L(C):L(E):L(P)=-2:1:1$ under the $SU(3)_F$ symmetry, and the rescattering contributions in the $T$ and $A$ diagrams only arise from the $SU(3)_F$ breaking effects. Taking $D\to Kπ$ and $D\to ππ$ modes as examples, we present our framework in detail. We find the Isospin relations in these decays are still valid in terms of triangle diagrams. Besides, the conclusions in the $D$ meson decays under the $SU(3)_F$ symmetry can be generalized to the $B$ meson decays under the $SU(4)_F$ symmetry.

preprint2022arXiv

From topological amplitude to rescattering dynamics in doubly charmed baryon decays

The doubly charmed baryon $Ξ^{++}_{cc}$ was observed by LHCb cooperation in 2017. The branching fractions of two-body doubly charmed baryon decays were predicted in the framework of rescattering mechanism, and some $SU(3)_F$ relations were investigated in the topological amplitudes. In this work, we study the correlation between topological diagram at quark level and rescattering triangle diagram at hadron level in the doubly charmed baryon decay. The completeness of our framework is confirmed from the fact that all the twelve possible structures of meson-baryon scattering appear once each in the the intermediate form between topological diagram and triangle diagram, topological-scattering diagram. It is found the triangle diagrams derived from the topological diagrams are consistent with the ones derived directly from the chiral Lagrangian. The relative magnitudes of rescattering contributions in the $C$, $C^\prime$, $E$, $E^\prime$, $P$ and $P^\prime$ diagrams extracted from $SU(3)_F$ symmetry are consistent with the numerical analysis in literature. Taking $Ξ^{++}_{cc}\to Ξ^+_cπ^+$, $Ξ^{+}_{cc}\to Ξ^0_cπ^+$ and $Ξ^{+}_{cc}\to Ξ^+_cπ^0$ modes as examples, we show the isospin relation is satisfied in terms of triangle diagrams.

preprint2022arXiv

Heterogeneous Federated Learning

Federated learning learns from scattered data by fusing collaborative models from local nodes. However, due to chaotic information distribution, the model fusion may suffer from structural misalignment with regard to unmatched parameters. In this work, we propose a novel federated learning framework to resolve this issue by establishing a firm structure-information alignment across collaborative models. Specifically, we design a feature-oriented regulation method ({$Ψ$-Net}) to ensure explicit feature information allocation in different neural network structures. Applying this regulating method to collaborative models, matchable structures with similar feature information can be initialized at the very early training stage. During the federated learning process under either IID or non-IID scenarios, dedicated collaboration schemes further guarantee ordered information distribution with definite structure matching, so as the comprehensive model alignment. Eventually, this framework effectively enhances the federated learning applicability to extensive heterogeneous settings, while providing excellent convergence speed, accuracy, and computation/communication efficiency.

preprint2022arXiv

Kullback-Leibler-Based Discrete Failure Time Models for Integration of Published Prediction Models with New Time-To-Event Dataset

Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distributions from the external and internal data sources are similar, which is often invalid. To account for challenges including heterogeneity, data sharing, and privacy constraints, we propose a discrete failure time modeling procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the published models and the internal dataset. Simulations show the advantage of the proposed method compared with those solely based on the internal data or published models. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-scale dataset with published survival models obtained from the national transplant registry.

preprint2022arXiv

Levi-type Schur-Sergeev duality for general linear super groups

In this note, we investigate a kind of double centralizer property for general linear supergroups. For the super space $V=\mathbb{K}^{m\mid n}$ over an algebraically closed field $\mathbb{K}$ whose characteristic is not equal to $2$, we consider its $\mathbb{Z}_2$-homogeneous one-dimensional extension $\underline V=V\oplus\mathbb{K}v$, and the natural action of the supergroup $\tilde G:=\text{GL}(V)\times \textbf{G}_m$ on $\underline V$. Then we have the tensor product supermodule ($\underline{V}^{\otimes r}$, $ρ_r$) of $\tilde G$. We present a kind of generalized Schur-Sergeev duality which is said that the Schur superalgebras $S&#39;(m|n,r)$ of $\tilde G$ and a so-called weak degenerate double Hecke algebra $\underline{\mathcal{H}}_r$ are double centralizers. The weak degenerate double Hecke algebra is an infinite dimensional algebra, which has a natural representation on the tensor product space. This notion comes from \cite{B-Y-Y2020}, with a little modification.

preprint2022arXiv

Nonparametric Quantile Regression for Homogeneity Pursuit in Panel Data Models

Many panel data have the latent subgroup effect on individuals, and it is important to correctly identify these groups since the efficiency of resulting estimators can be improved significantly by pooling the information of individuals within each group. However, the currently assumed parametric and semiparametric relationship between the response and predictors may be misspecified, which leads to a wrong grouping result, and the nonparametric approach hence can be considered to avoid such mistakes. Moreover, the response may depend on predictors in different ways at various quantile levels, and the corresponding grouping structure may also vary. To tackle these problems, this article proposes a nonparametric quantile regression method for homogeneity pursuit in panel data models with individual effects, and a pairwise fused penalty is used to automatically select the number of groups. The asymptotic properties are established, and an ADMM algorithm is also developed. The finite sample performance is evaluated by simulation experiments, and the usefulness of the proposed methodology is further illustrated by an empirical example.

preprint2022arXiv

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting where each arm&#39;s reward distribution only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $ε$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we establish the lower bound to show that the instance-dependent regret of our improved algorithm is optimal. In the second part, we study the problem in the $ε$-LDP model. We propose an algorithm that can be seen as locally private and robust version of SE algorithm, which provably achieves (near) optimal rates for both instance-dependent and instance-independent regret. Our results reveal differences between the problem of private MAB with bounded/sub-Gaussian rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might be used to other related problems. Finally, experiments also support our theoretical findings and show the effectiveness of our algorithms.

preprint2022arXiv

Pay Attention to Hard Trials

Performance of speaker recognition systems is evaluated on test trials. Although as crucial as rulers for tailors, trials have not been carefully treated so far, and most existing benchmarks compose trials by naive cross-pairing. In this paper, we argue that the cross-pairing approach produces overwhelming easy trials, which in turn leads to potential bias in system and technique comparison. To solve the problem, we advocate more attention to hard trials. We present an SVM-based approach to identifying hard trials and use it to construct new evaluation sets for VoxCeleb1 and SITW. With the new sets, we can re-evaluate the contribution of some recent technologies. The code and the identified hard trials will be published online at http://project.cslt.org.

preprint2022arXiv

Rate-Optimal Robust Estimation of High-Dimensional Vector Autoregressive Models

High-dimensional time series data appear in many scientific areas in the current data-rich environment. Analysis of such data poses new challenges to data analysts because of not only the complicated dynamic dependence between the series, but also the existence of aberrant observations, such as missing values, contaminated observations, and heavy-tailed distributions. For high-dimensional vector autoregressive (VAR) models, we introduce a unified estimation procedure that is robust to model misspecification, heavy-tailed noise contamination, and conditional heteroscedasticity. The proposed methodology enjoys both statistical optimality and computational efficiency, and can handle many popular high-dimensional models, such as sparse, reduced-rank, banded, and network-structured VAR models. With proper regularization and data truncation, the estimation convergence rates are shown to be almost optimal in the minimax sense under a bounded $(2+2ε)$-th moment condition. When $ε\geq1$, the rates of convergence match those obtained under the sub-Gaussian assumption. Consistency of the proposed estimators is also established for some $ε\in(0,1)$, with minimax optimal convergence rates associated with $ε$. The efficacy of the proposed estimation methods is demonstrated by simulation and a U.S. macroeconomic example.

preprint2022arXiv

The SAMI Galaxy Survey: Using concentrated star-formation and stellar population ages to understand environmental quenching

We study environmental quenching using the spatial distribution of current star-formation and stellar population ages with the full SAMI Galaxy Survey. By using a star-formation concentration index [C-index, defined as log10(r_{50,Halpha}/r_{50,cont})], we separate our sample into regular galaxies (C-index>-0.2) and galaxies with centrally concentrated star-formation (SF-concentrated; C-index<-0.2). Concentrated star-formation is a potential indicator of galaxies currently undergoing `outside-in&#39; quenching. Our environments cover ungrouped galaxies, low-mass groups (M_200<10^12.5 M_sun), high-mass groups (M_200 in the range 10^{12.5-14} M_sun) and clusters (M_200>10^14 M_sun). We find the fraction of SF-concentrated galaxies increases as halo mass increases with 9\pm2 per cent, 8\pm3 per cent, 19\pm4 per cent and 29\pm4 per cent for ungrouped galaxies, low-mass groups, high-mass groups and clusters, respectively. We interpret these results as evidence for `outside-in&#39; quenching in groups and clusters. To investigate the quenching time-scale in SF-concentrated galaxies, we calculate light-weighted age (Age_L) and mass-weighted age (Age_M) using full spectral fitting, as well as the Dn4000 and Hdelta_A indices. We assume that the average galaxy age radial profile before entering a group or cluster is similar to ungrouped regular galaxies. At large radius (1-2 R_e), SF-concentrated galaxies in high-mass groups have older ages than ungrouped regular galaxies with an age difference of 1.83\pm0.38 Gyr for Age_L and 1.34\pm0.56 Gyr for Age_M. This suggests that while `outside-in&#39; quenching can be effective in groups, the process will not quickly quench the entire galaxy. In contrast, the ages at 1-2 R_e of cluster SF-concentrated galaxies and ungrouped regular galaxies are consistent (0.19\pm0.21 Gyr for Age_L, 0.40\pm0.61 Gyr for Age_M), suggesting the quenching process must be rapid.

preprint2022arXiv

The tracking detector of the FASER experiment

FASER is a new experiment designed to search for new light weakly-interacting long-lived particles (LLPs) and study high-energy neutrino interactions in the very forward region of the LHC collisions at CERN. The experimental apparatus is situated 480 m downstream of the ATLAS interaction-point aligned with the beam collision axis. The FASER detector includes four identical tracker stations constructed from silicon microstrip detectors. Three of the tracker stations form a tracking spectrometer, and enable FASER to detect the decay products of LLPs decaying inside the apparatus, whereas the fourth station is used for the neutrino analysis. The spectrometer has been installed in the LHC complex since March 2021, while the fourth station is not yet installed. FASER will start physics data taking when the LHC resumes operation in early 2022. This paper describes the design, construction and testing of the tracking spectrometer, including the associated components such as the mechanics, readout electronics, power supplies and cooling system.

preprint2022arXiv

The trigger and data acquisition system of the FASER experiment

The FASER experiment is a new small and inexpensive experiment that is placed 480 meters downstream of the ATLAS experiment at the CERN LHC. FASER is designed to capture decays of new long-lived particles, produced outside of the ATLAS detector acceptance. These rare particles can decay in the FASER detector together with about 500-1000 Hz of other particles originating from the ATLAS interaction point. A very high efficiency trigger and data acquisition system is required to ensure that the physics events of interest will be recorded. This paper describes the trigger and data acquisition system of the FASER experiment and presents performance results of the system acquired during initial commissioning.

preprint2022arXiv

Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration

Recent learning-based image fusion methods have marked numerous progress in pre-registered multi-modality data, but suffered serious ghosts dealing with misaligned multi-modality data, due to the spatial deformation and the difficulty narrowing cross-modality discrepancy. To overcome the obstacles, in this paper, we present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion (IVIF). Specifically, we propose a Cross-modality Perceptual Style Transfer Network (CPSTN) to generate a pseudo infrared image taking a visible image as input. Benefiting from the favorable geometry preservation ability of the CPSTN, the generated pseudo infrared image embraces a sharp structure, which is more conducive to transforming cross-modality image alignment into mono-modality registration coupled with the structure-sensitive of the infrared image. In this case, we introduce a Multi-level Refinement Registration Network (MRRN) to predict the displacement vector field between distorted and pseudo infrared images and reconstruct registered infrared image under the mono-modality setting. Moreover, to better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM) to adaptively select more meaningful features for fusion in the Dual-path Interaction Fusion Network (DIFN). Extensive experimental results suggest that the proposed method performs superior capability on misaligned cross-modality image fusion.

preprint2021arXiv

CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we hypothesize that combining rationality and emotion into conversational agents can improve response quality. To test the hypothesis, we focus on one fundamental aspect of rationality, i.e., commonsense, and propose CARE, a novel model for commonsense-aware emotional response generation. Specifically, we first propose a framework to learn and construct commonsense-aware emotional latent concepts of the response given an input message and a desired emotion. We then propose three methods to collaboratively incorporate the latent concepts into response generation. Experimental results on two large-scale datasets support our hypothesis and show that our model can produce more accurate and commonsense-aware emotional responses and achieve better human ratings than state-of-the-art models that only specialize in one aspect.

preprint2021arXiv

Magneto-electric Tuning of Pinning-Type Permanent Magnets through Atomic-Scale Engineering of Grain Boundaries

Pinning-type magnets maintaining high coercivity, i.e. the ability to sustain magnetization, at high temperature are at the core of thriving clean-energy technologies. Among these, Sm2Co17-based magnets are excellent candidates owing to their high-temperature stability. However, despite decades of efforts to optimize the intragranular microstructure, the coercivity currently only reaches 20~30% of the theoretical limits. Here, the roles of the grain-interior nanostructure and the grain boundaries in controlling coercivity are disentangled by an emerging magneto-electric approach. Through hydrogen charging/discharging by applying voltages of only ~ 1 V, the coercivity is reversibly tuned by an unprecedented value of ~ 1.3 T. In situ magneto-structural measurements and atomic-scale tracking of hydrogen atoms reveal that the segregation of hydrogen atoms at the grain boundaries, rather than the change of the crystal structure, dominates the reversible and substantial change of coercivity. Hydrogen lowers the local magnetocrystalline anisotropy and facilitates the magnetization reversal starting from the grain boundaries. Our study reveals the previously neglected critical role of grain boundaries in the conventional magnetisation-switching paradigm, suggesting a critical reconsideration of strategies to overcome the coercivity limits in permanent magnets, via for instance atomic-scale grain boundary engineering.

preprint2021arXiv

VDPC: Variational Density Peak Clustering Algorithm

The widely applied density peak clustering (DPC) algorithm makes an intuitive cluster formation assumption that cluster centers are often surrounded by data points with lower local density and far away from other data points with higher local density. However, this assumption suffers from one limitation that it is often problematic when identifying clusters with lower density because they might be easily merged into other clusters with higher density. As a result, DPC may not be able to identify clusters with variational density. To address this issue, we propose a variational density peak clustering (VDPC) algorithm, which is designed to systematically and autonomously perform the clustering task on datasets with various types of density distributions. Specifically, we first propose a novel method to identify the representatives among all data points and construct initial clusters based on the identified representatives for further analysis of the clusters&#39; property. Furthermore, we divide all data points into different levels according to their local density and propose a unified clustering framework by combining the advantages of both DPC and DBSCAN. Thus, all the identified initial clusters spreading across different density levels are systematically processed to form the final clusters. To evaluate the effectiveness of the proposed VDPC algorithm, we conduct extensive experiments using 20 datasets including eight synthetic, six real-world and six image datasets. The experimental results show that VDPC outperforms two classical algorithms (i.e., DPC and DBSCAN) and four state-of-the-art extended DPC algorithms.

preprint2020arXiv

$p$-Norm Flow Diffusion for Local Graph Clustering

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $p\in (1,\infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.

preprint2020arXiv

A chip-scale oscillation-mode optomechanical inertial sensor near the thermodynamical limits

High-precision inertial sensing and gravity sensing are key in navigation, oil exploration, and earthquake prediction. In contrast to prior accelerometers using piezoelectric or electronic capacitance readout techniques, optical readout provides narrow-linewidth high-sensitivity laser detection along with low-noise resonant optomechanical transduction near the thermodynamical limits. Here an optomechanical inertial sensor with 8.2micro-g/Hz^1/2 velocity random walk (VRW) at acquisition rate of 100 Hz and 50.9 micro-g bias instability is demonstrated, suitable for consumer and industrial grade applications, e.g., inertial navigation, inclination sensing, platform stabilization, and/or wearable device motion detection. Driven into optomechanical sustained-oscillation, the slot photonic crystal cavity provides radio-frequency readout of the optically-driven transduction with enhanced 625 microg/Hz sensitivity. Measuring the optomechanically-stiffened oscillation shift, instead of the optical transmission shift, provides a 220x VRW enhancement over pre-oscillation mode detection due to the strong optomechanical transduction. Supported by theory, this inertial sensor operates 2.56x above the thermodynamical limit at small integration times, with 43-dB dynamic range, in a solid-state room-temperature readout architecture.

preprint2020arXiv

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency

Convolutional Neural Networks (CNNs) achieved great cognitive performance at the expense of considerable computation load. To relieve the computation load, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight sparsity and filter pruning. However, these works only evaluate model components&#39; static significance with internal parameter information, ignoring their dynamic interaction with external inputs. With per-input feature activation, the model component significance can dynamically change, and thus the static methods can only achieve sub-optimal results. Therefore, we propose a dynamic CNN optimization framework in this work. Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework including (1) testing-phase channel and column feature map pruning, as well as (2) training-phase optimization by targeted dropout. Such a dynamic optimization framework has several benefits: (1) First, it can accurately identify and aggressively remove per-input feature redundancy with considering the model-input interaction; (2) Meanwhile, it can maximally remove the feature map redundancy in various dimensions thanks to the multi-dimension flexibility; (3) The training-testing co-optimization favors the dynamic pruning and helps maintain the model accuracy even with very high feature pruning ratio. Extensive experiments show that our method could bring 37.4% to 54.5% FLOPs reduction with negligible accuracy drop on various of test networks.

preprint2020arXiv

Charmed baryon decays in $SU(3)_F$ symmetry

In the recent years, fruitful results on charmed baryons are obtained by BESIII, Belle and LHCb. We investigate the two-body non-leptonic decays of charmed baryons in the flavor $SU(3)$ symmetry. Hundreds of amplitude relations are clearly provided, and are classified according to the $I$-, $U$- and $V$-spin symmetries. Among them, some amplitude relations are tested by the experimental data, or used to predict the branching fractions based on the exact flavor symmetry without any other approximation. Some relations of $K^0_S-K^0_L$ asymmetries and $CP$ asymmetries are obtained under the $U$-spin symmetry in the modes of charmed baryon decaying into neutral kaons. Besides, the $U$-spin breaking effect is explored in the $Λ_c^+\to Σ^+K^{*0}$ and $Ξ_c^+\to p\bar{K}^{*0}$ modes.

preprint2020arXiv

Cox-nnet v2.0: improved neural-network based survival prediction extended to large-scale EMR dataset

Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. Applying on an EMR dataset of OPTN kidney transplantation, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32 folds (n=10,000) and achieves better prediction accuracy than Cox-PH (p<0.05). Availability and implementation: Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0

preprint2020arXiv

Distributed Equivalent Substitution Training for Large-Scale Recommender Systems

We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.

preprint2020arXiv

Distributed Kernel Ridge Regression with Communications

This paper focuses on generalization performance analysis for distributed algorithms in the framework of learning theory. Taking distributed kernel ridge regression (DKRR) for example, we succeed in deriving its optimal learning rates in expectation and providing theoretically optimal ranges of the number of local processors. Due to the gap between theory and experiments, we also deduce optimal learning rates for DKRR in probability to essentially reflect the generalization performance and limitations of DKRR. Furthermore, we propose a communication strategy to improve the learning performance of DKRR and demonstrate the power of communications in DKRR via both theoretical assessments and numerical experiments.

preprint2020arXiv

ECG Beats Fast Classification Base on Sparse Dictionaries

Feature extraction plays an important role in Electrocardiogram (ECG) Beats classification system. Compared to other popular methods, VQ method performs well in feature extraction from ECG with advantages of dimensionality reduction. In VQ method, a set of dictionaries corresponding to segments of ECG beats is trained, and VQ codes are used to represent each heartbeat. However, in practice, VQ codes optimized by k-means or k-means++ exist large quantization errors, which results in VQ codes for two heartbeats of the same type being very different. So the essential differences between different types of heartbeats cannot be representative well. On the other hand, VQ uses too much data during codebook construction, which limits the speed of dictionary learning. In this paper, we propose a new method to improve the speed and accuracy of VQ method. To reduce the computation of codebook construction, a set of sparse dictionaries corresponding to wave segments of ECG beats is constructed. After initialized, sparse dictionaries are updated efficiently by Feature-sign and Lagrange dual algorithm. Based on those dictionaries, a set of codes can be computed to represent original ECG beats.Experimental results show that features extracted from ECG by our method are more efficient and separable. The accuracy of our method is higher than other methods with less time consumption of feature extraction

preprint2020arXiv

EEG-Based Emotion Recognition Using Regularized Graph Neural Networks

Electroencephalography (EEG) measures the neuronal activities in different brain regions via electrodes. Many existing studies on EEG-based emotion recognition do not fully exploit the topology of EEG channels. In this paper, we propose a regularized graph neural network (RGNN) for EEG-based emotion recognition. RGNN considers the biological topology among different brain regions to capture both local and global relations among different EEG channels. Specifically, we model the inter-channel relations in EEG signals via an adjacency matrix in a graph neural network where the connection and sparseness of the adjacency matrix are inspired by neuroscience theories of human brain organization. In addition, we propose two regularizers, namely node-wise domain adversarial training (NodeDAT) and emotion-aware distribution learning (EmotionDL), to better handle cross-subject EEG variations and noisy labels, respectively. Extensive experiments on two public datasets, SEED and SEED-IV, demonstrate the superior performance of our model than state-of-the-art models in most experimental settings. Moreover, ablation studies show that the proposed adjacency matrix and two regularizers contribute consistent and significant gain to the performance of our RGNN model. Finally, investigations on the neuronal activities reveal important brain regions and inter-channel relations for EEG-based emotion recognition.

preprint2020arXiv

Electronic and magnetic properties of the Jahn-Teller active fluoride $\mathrm{NaCrF_{3}}$ from first-principles calculations

In perovskite-type compounds, the interplay of cooperative Jahn-Teller effect, electronic correlations and orbital degree of freedom leads to intriguing properties. $\mathrm{NaCrF_{3}}$ is a newly synthesized Jahn-Teller active fluoroperovskite where the $\mathrm{CrF_{6}^{4-}}$ octahedrons are considerably distorted. Based on the first-principles calculation, we analyze its electronic structure and magnetic properties. Our numerical results show that the $\mathrm{Cr^{2+}}$ ions adopt the high-spin $t_{2g\uparrow}^{3}e_{g\uparrow}^{1}$ configuration with $G$-type orbital ordering. We also estimate the magnetic exchange couplings and find that the in-plane and interplanar nearest-neighbor interactions are ferromagnetic and antiferromagnetic, respectively. The ground state of this material is $A$-type antiferromagnetic, in agreement with the experiments. Reasonable Curie-Weiss and $\mathrm{N\acute{e}el}$ temperatures compared to the experiments are given by mean-field approximation theory. Our results give a complete explanation of its electronic structure, magnetic and orbital order, and help to further comprehend the behaviors of Jahn-Teller active perovskite-type fluoride.

preprint2020arXiv

Enabling optical steganography, data storage, and encryption with plasmonic colors

Plasmonic color generation utilizing ultra-thin metasurfaces as well as metallic nanoparticles hold a great promise for a wide range of applications, including color displays, data storage, and information encryption due to its high spatial resolution and mechanical/chemical stability. Most of the recently demonstrated systems generate static colors; however, more advanced applications such as data storage require fast and flexible means to tune the plasmonic colors, while keeping them vibrant and stable. Here, a surface-relief aluminum metasurface that reflects polarization-tunable plasmonic colors is designed and experimentally demonstrated. Excitation of localized surface plasmons encodes discrete combinations of the incident and reflected polarized light into diverse colors. A single storage unit - a nanopixel - stores a multiple-bit piece of information in the orientation of its constituent nanoantennae. This information is then reliably retrieved by inspecting the reflected color sequence with two linear polarizers. It is the broad color variability and high spatial resolution of the proposed encoding approach that supports a strong promise for rapid parallel read-out and encryption of high-density optical data. Our method also enables the robust generation of dynamic kaleidoscopic images with no detrimental &#34;cross-talk&#34; effect. The approach opens up a new route for advanced dynamic steganography, high-density parallel-access optical data storage, and optical information encryption.

preprint2020arXiv

Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications

Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multimodal interfaces. We present Geno---a developer tool for adding the voice input modality to existing web apps without requiring significant NLP expertise. Geno provides a high-level workflow for developers to specify functionalities to be supported by voice (intents), create language models for detecting intents and the relevant information (parameters) from user utterances, and fulfill the intents by either programmatically invoking the corresponding functions or replaying GUI actions on the web app. Geno further supports multimodal references to GUI context in voice commands (e.g. &#34;move this [event] to next week&#34; while pointing at an event with the cursor). In a study, developers with little NLP expertise were able to add multimodal voice command support for two existing web apps using Geno.

preprint2020arXiv

Heterogeneous-Temporal Graph Convolutional Networks: Make the Community Detection Much Better

Community detection has long been an important yet challenging task to analyze complex networks with a focus on detecting topological structures of graph data. Essentially, real-world graph data contains various features, node and edge types which dynamically vary over time, and this invalidates most existing community detection approaches. To cope with these issues, this paper proposes the heterogeneous-temporal graph convolutional networks (HTGCN) to detect communities from hetergeneous and temporal graphs. Particularly, we first design a heterogeneous GCN component to acquire feature representations for each heterogeneous graph at each time step. Then, a residual compressed aggregation component is proposed to represent &#34;dynamic&#34; features for &#34;varying&#34; communities, which are then aggregated with &#34;static&#34; features extracted from current graph. Extensive experiments are evaluated on two real-world datasets, i.e., DBLP and IMDB. The promising results demonstrate that the proposed HTGCN is superior to both benchmark and the state-of-the-art approaches, e.g., GCN, GAT, GNN, LGNN, HAN and STAR, with respect to a number of evaluation criteria.

preprint2020arXiv

Liquid Resource Types

This article presents liquid resource types, a technique for automatically verifying the resource consumption of functional programs. Existing resource analysis techniques trade automation for flexibility -- automated techniques are restricted to relatively constrained families of resource bounds, while more expressive proof techniques admitting value-dependent bounds rely on handwritten proofs. Liquid resource types combine the best of these approaches, using logical refinements to automatically prove precise bounds on a program&#39;s resource consumption. The type system augments refinement types with potential annotations to conduct an amortized resource analysis. Importantly, users can annotate data structure declarations to indicate how potential is allocated within the type, allowing the system to express bounds with polynomials and exponentials, as well as more precise expressions depending on program values. We prove the soundness of the type system, provide a library of flexible and reusable data structures for conducting resource analysis, and use our prototype implementation to automatically verify resource bounds that previously required a manual proof.

preprint2020arXiv

Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations

Minimizing a convex risk function is the main step in many basic learning algorithms. We study protocols for convex optimization which provably leak very little about the individual data points that constitute the loss function. Specifically, we consider differentially private algorithms that operate in the local model, where each data record is stored on a separate user device and randomization is performed locally by those devices. We give new protocols for \emph{noninteractive} LDP convex optimization---i.e., protocols that require only a single randomized report from each user to an untrusted aggregator. We study our algorithms&#39; performance with respect to expected loss---either over the data set at hand (empirical risk) or a larger population from which our data set is assumed to be drawn. Our error bounds depend on the form of individuals&#39; contribution to the expected loss. For the case of \emph{generalized linear losses} (such as hinge and logistic losses), we give an LDP algorithm whose sample complexity is only linear in the dimensionality $p$ and quasipolynomial in other terms (the privacy parameters $ε$ and $δ$, and the desired excess risk $α$). This is the first algorithm for nonsmooth losses with sub-exponential dependence on $p$. For the Euclidean median problem, where the loss is given by the Euclidean distance to a given data point, we give a protocol whose sample complexity grows quasipolynomially in $p$. This is the first protocol with sub-exponential dependence on $p$ for a loss that is not a generalized linear loss . Our result for the hinge loss is based on a technique, dubbed polynomial of inner product approximation, which may be applicable to other problems. Our results for generalized linear losses and the Euclidean median are based on new reductions to the case of hinge loss.

preprint2020arXiv

Raising Expectations: Automating Expected Cost Analysis with Types

This article presents a type-based analysis for deriving upper bounds on the expected execution cost of probabilistic programs. The analysis is naturally compositional, parametric in the cost model, and supports higher order functions and inductive data types. The derived bounds are multivariate polynomials that are functions of data structures. Bound inference is enabled by local type rules that reduce type inference to linear constraint solving. The type system is based on the potential method of amortized analysis and extends automatic amortized resource analysis (AARA) for deterministic programs. A main innovation is that bounds can contain symbolic probabilities, which may appear in data structures and function arguments. Another contribution is a novel soundness proof that establishes the correctness of the derived bounds with respect to a distribution-based operational cost semantics that also includes nontrivial diverging behavior. For cost models like time, derived bounds imply termination with probability one. To highlight the novel ideas, the presentation focuses on linear potential and a core language. However, the analysis is implemented as an extension of Resource Aware ML and supports polynomial bounds and user defined data structures. The effectiveness of the technique is evaluated by analyzing the sample complexity of discrete distributions and with a novel average-case estimation for deterministic programs that combines expected cost analysis with statistical methods.

preprint2020arXiv

Strange Axial-vector Mesons in D Meson Decays

The nature of strange axial-vector mesons are not well understood and can be investigated in $D$ meson decays. In this work, it is found that the experimental data of $D^0\rightarrow K^\pm K^\mp_1(1270)(\to ρK \,\,\text{or}\,\,K^{*}π)$ in the $D^0\to K^+K^-π^+π^-$ mode, disagree with the equality relation under the narrow width approximation and $CP$ conservation of strong decays. Considering more other results of $K_1(1270)$ decays, the data of $\mathcal{B}(D^0\rightarrow K^-K^+_1(1270)(\to K^{*0}π^+))$ is probably overestimated by one order of magnitude. We then calculate the branching fractions of the corresponding processes with $K_1(1400)$ in the factorization approach, and find $\mathcal{B}(D^0\rightarrow K^-K^+_1(1400)(\to K^{*0}π^+))$ is comparable to the predicted $\mathcal{B}(D^0\rightarrow K^-K^+_1(1270)(\to K^{*0}π^+))$ using the equality relation. Besides, we suggest to measure the ratios between $K_1(1270)\to ρK$ and $K^*π$ or to test the equality relations in other $D$ meson decay modes.

preprint2020arXiv

Towards Assessment of Randomized Smoothing Mechanisms for Certifying Adversarial Robustness

As a certified defensive technique, randomized smoothing has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions remain unanswered, such as (i) whether the Gaussian mechanism is an appropriate option for certifying $\ell_2$-norm robustness, and (ii) whether there is an appropriate randomized (smoothing) mechanism to certify $\ell_\infty$-norm robustness. To shed light on these questions, we argue that the main difficulty is how to assess the appropriateness of each randomized mechanism. In this paper, we propose a generic framework that connects the existing frameworks in \cite{lecuyer2018certified, li2019certified}, to assess randomized mechanisms. Under our framework, for a randomized mechanism that can certify a certain extent of robustness, we define the magnitude of its required additive noise as the metric for assessing its appropriateness. We also prove lower bounds on this metric for the $\ell_2$-norm and $\ell_\infty$-norm cases as the criteria for assessment. Based on our framework, we assess the Gaussian and Exponential mechanisms by comparing the magnitude of additive noise required by these mechanisms and the lower bounds (criteria). We first conclude that the Gaussian mechanism is indeed an appropriate option to certify $\ell_2$-norm robustness. Surprisingly, we show that the Gaussian mechanism is also an appropriate option for certifying $\ell_\infty$-norm robustness, instead of the Exponential mechanism. Finally, we generalize our framework to $\ell_p$-norm for any $p\geq2$. Our theoretical findings are verified by evaluations on CIFAR10 and ImageNet.

preprint2019arXiv

Terahertz dual phase gradient metasurface: high-efficiency binary-channel spoof surface plasmon excitation

Spoof surface plasmon meta-couplers are compact antennas which link propagating waves and surface waves. However, most of them are designed with a fixed phase gradient and channel for the incident waves with specific polarization, which limits their further applications in multichannel scenarios. In this Letter, we propose, to the best of our knowledge, a new method that combines the Brillouin folds theory with the Generalized Snell Law. We demonstrate that when the phase gradient of the metasurface is large enough, Brillouin folds effect will occur, which will create dual phase gradient space in a single metasurface. With this method, we design two novel terahertz meta-couplers with functionalities of symmetrical and asymmetrical binary-channel/bidirectional SSP excitation. Furthermore, finite element method (FEM) simulations are performed to demonstrate their functionalities. Considering the orthogonality of the incident waves, there can be a total of four independent space channels to excite SSP on one metasurface. This work may open up new routes in multi-channel SSP meta-couplers and multi-beam surface wave antennas.

preprint2019arXiv

XFe4Ge2 (X = Y, Lu) and Mn3Pt: Filling-enforced magnetic topological metals

Magnetism, coupled with nontrivial band topology, can bring about many interesting and exotic phenomena, so that magnetic topological materials have attracted persistent research interest. However, compared with non-magnetic topological materials (TMs), the magnetic TMs are less studied, since their magnetic structures and topological phase transitions are usually complex and the first-principles predictions are usually sensitive on the effect of Coulomb interaction. In this work, we present a comprehensive investigation of XFe4Ge2 (X = Y, Lu) and Mn3Pt, and find these materials to be filling-enforced magnetic topological metals. Our first-principles calculations show that XFe4Ge2 (X = Y, Lu) host Dirac points near the Fermi level at high symmetry point S. These Dirac points are protected by PT symmetry (P and T are inversion and time-reversal transformations, respectively) and a 2-fold screw rotation symmetry. Moreover, through breaking PT symmetry, the Dirac points would split into Weyl nodes. Mn3Pt is found to host 4-fold degenerate band crossings in the whole high symmetry path of A-Z. We also utilize the GGA+U scheme to take into account the effect of Coulomb repulsion and find that the filling-enforced topological properties are naturally insensitive on U.