Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
73works
0followers
40topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

73 published item(s)

preprint2026arXiv

SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

We present our shared task on evaluating the adaptability of LLMs and NLP systems across multiple languages and cultures. The task data consist of an extended version of our manually constructed BLEnD benchmark (Myung et al. 2024), covering more than 30 language-culture pairs, predominantly representing low-resource languages spoken across multiple continents. As the task is designed strictly for evaluation, participants were not permitted to use the data for training, fine-tuning, few-shot learning, or any other form of model modification. Our task includes two tracks: (a) Short-Answer Questions (SAQ) and (b) Multiple-Choice Questions (MCQ). Participants were required to predict labels and were allowed to submit any NLP system and adopt diverse modelling strategies, provided that the benchmark was used solely for evaluation. The task attracted more than 140 registered participants, and we received final submissions from 62 teams, along with 19 system description papers. We report the results and present an analysis of the best-performing systems and the most commonly adopted approaches. Furthermore, we discuss shared insights into open questions and challenges related to evaluation, misalignment, and methodological perspectives on model behaviour in low-resource languages and for under-represented cultures.

preprint2024arXiv

A Pure Integral-Type PLL with a Damping Branch to Enhance the Stability of Grid-Tied Inverter under Weak Grids

In a phase-locked loop (PLL) synchronized inverter, due to the strong nonlinear coupling between the PLL's parame-ters and the operation power angle, the equivalent damping coefficient will quickly deteriorate while the power angle is close to 90° under an ultra-weak grid, which causes the synchronous instability. To address this issue, in this letter, a pure integral-type phase-locked loop (IPLL) with a damping branch is proposed to replace the traditional PI-type PLL. The equivalent damping coefficient of an IPLL-synchronized inverter is decoupled with the steady-state power angle. As a result, the IPLL-synchronized inverter can stably operate under an ultra-weak grid when the equilibrium point exists. Finally, time-domain simulation results verify the effectiveness and correctness of the proposed IPLL.

preprint2024arXiv

On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error

We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most $n$ components, where the marginal constraints of standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor $τ$. Although only Sinkhorn-based UOT solvers have been analyzed in the literature with the iteration complexity of ${O}\big(\tfrac{τ\log(n)}{\varepsilon} \log\big(\tfrac{\log(n)}{\varepsilon}\big)\big)$ and per-iteration cost of $O(n^2)$ for achieving the desired error $\varepsilon$, their positively dense output transportation plans strongly hinder the practicality. On the other hand, while being vastly used as heuristics for computing UOT in modern deep learning applications and having shown success in sparse OT problem, gradient methods applied to UOT have not been formally studied. In this paper, we propose a novel algorithm based on Gradient Extrapolation Method (GEM-UOT) to find an $\varepsilon$-approximate solution to the UOT problem in $O\big( κ\log\big(\frac{τn}{\varepsilon}\big) \big)$ iterations with $\widetilde{O}(n^2)$ per-iteration cost, where $κ$ is the condition number depending on only the two input measures. Our proof technique is based on a novel dual formulation of the squared $\ell_2$-norm UOT objective, which fills the lack of sparse UOT literature and also leads to a new characterization of approximation error between UOT and OT. To this end, we further present a novel approach of OT retrieval from UOT, which is based on GEM-UOT with fine tuned $τ$ and a post-process projection step. Extensive experiments on synthetic and real datasets validate our theories and demonstrate the favorable performance of our methods in practice.

preprint2023arXiv

Extended Load Flexibility of Utility-Scale P2H Plants: Optimal Production Scheduling Considering Dynamic Thermal and HTO Impurity Effects

In the conversion toward a clear and sustainable energy system, the flexibility of power-to-hydrogen (P2H) production enables the admittance of volatile renewable energies on a utility scale and provides the connected electrical power system with ancillary services. To extend the load flexibility and thus improve the profitability of green hydrogen production, this paper presents an optimal production scheduling approach for utility-scale P2H plants composed of multiple alkaline electrolyzers. Unlike existing works, this work discards the conservative constant steady-state constraints and first leverages the dynamic thermal and hydrogen-to-oxygen (HTO) impurity crossover processes of electrolyzers. Doing this optimizes their effects on the loading range and energy conversion efficiency, therefore improving the load flexibility of P2H production. The proposed multiphysics-aware scheduling model is formulated as mixed-integer linear programming (MILP). It coordinates the electrolyzers' operation state transitions and load allocation subject to comprehensive thermodynamic and mass transfer constraints. A decomposition-based solution method, SDM-GS-ALM, is followingly adopted to address the scalability issue for scheduling large-scale P2H plants composed of tens of electrolyzers. With an experiment-verified dynamic electrolyzer model, case studies up to 22 electrolyzers show that the proposed method remarkably improves the hydrogen output and profit of P2H production powered by either solar or wind energy compared to the existing scheduling approach.

preprint2022arXiv

A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(κ^{3.5}ε^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a Łojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.

preprint2022arXiv

A likelihood based sensitivity analysis for publication bias on summary ROC in meta-analysis of diagnostic test accuracy

In meta-analysis of diagnostic test accuracy, summary receiver operating characteristic (SROC) is a recommended method to summarize the discriminant capacity of a diagnostic test in the presence of study-specific cutoff values and the area under the SROC (SAUC) gives the aggregate measure of test accuracy. SROC or SAUC can be estimated by bivariate modelling of pairs of sensitivity and specificity over the primary diagnostic studies. However, publication bias is a major threat to the validity of estimates in meta-analysis. To address this issue, we propose to adopt sensitivity analysis to make an objective inference for the impact of publication bias on SROC or SAUC. We extend Copas likelihood based sensitivity analysis to the bivariate normal model used for meta-analysis of diagnostic test accuracy to evaluate how much SROC or SAUC would change with different selection probabilities under several selective publication mechanisms dependent on sensitivity and/or specificity. The selection probability is modelled by a selection function on $t$-type statistic for the linear combination of logit-transformed sensitivity and specificity, allowing the selective publication of each study to be influenced by the cutoff-dependent $p$-value for sensitivity, specificity, or diagnostic odds ratio. By embedding the selection function into the bivariate normal model, the conditional likelihood is proposed and the bias-corrected SROC or SAUC can be estimated by maximizing the likelihood. We illustrate the proposed sensitivity analysis by reanalyzing a meta-analysis of test accuracy for intravascular device related infection. Simulation studies are conducted to investigate the performance of proposed methods.

preprint2022arXiv

Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning

Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the existing studies show that it suffers from a high computation complexity in nonconvex minimax optimization. In this paper, we develop a single-loop and fast AltGDA-type algorithm that leverages proximal gradient updates and momentum acceleration to solve regularized nonconvex minimax optimization problems. By leveraging the momentum acceleration technique, we prove that the algorithm converges to a critical point in nonconvex minimax optimization and achieves a computation complexity in the order of $\mathcal{O}(κ^{\frac{11}{6}}ε^{-2})$, where $ε$ is the desired level of accuracy and $κ$ is the problem's condition number. {Such a computation complexity improves the state-of-the-art complexities of single-loop GDA and AltGDA algorithms (see the summary of comparison in \Cref{table1})}. We demonstrate the effectiveness of our algorithm via an experiment on adversarial deep learning.

preprint2022arXiv

Coordinated Frequency Control through Safe Reinforcement Learning

With widespread deployment of renewables, the electric power grids are experiencing increasing dynamics and uncertainties, with its secure operation being threatened. Existing frequency control schemes based on day-ahead offline analysis and minute-level online sensitivity calculations are difficult to adapt to rapidly changing system states. In particular, they are unable to facilitate coordinated control of system frequency and power flows. A refined approach and tools are urgently needed to assist system operators to make timely decisions. This paper proposes a novel model-free coordinated frequency control framework based on safe reinforcement learning, with multiple control objectives considered. The load frequency control problem is modeled as a constrained Markov decision process, which can be solved by an AI agent continuously interacting with the grid to achieve sub-second decision making. Extensive numerical experiments conducted at East China Power Grid demonstrate the effectiveness and promise of the proposed method.

preprint2022arXiv

Data Sampling Affects the Complexity of Online SGD over Dependent Data

Conventional machine learning applications typically assume that data samples are independently and identically distributed (i.i.d.). However, practical scenarios often involve a data-generating process that produces highly dependent data samples, which are known to heavily bias the stochastic optimization process and slow down the convergence of learning. In this paper, we conduct a fundamental study on how different stochastic data sampling schemes affect the sample complexity of online stochastic gradient descent (SGD) over highly dependent data. Specifically, with a $ϕ$-mixing model of data dependence, we show that online SGD with proper periodic data-subsampling achieves an improved sample complexity over the standard online SGD in the full spectrum of the data dependence level. Interestingly, even subsampling a subset of data samples can accelerate the convergence of online SGD over highly dependent data. Moreover, we show that online SGD with mini-batch sampling can further substantially improve the sample complexity over online SGD with periodic data-subsampling over highly dependent data. Numerical experiments validate our theoretical results.

preprint2022arXiv

DDDM: a Brain-Inspired Framework for Robust Classification

Despite their outstanding performance in a broad spectrum of real-world tasks, deep artificial neural networks are sensitive to input noises, particularly adversarial perturbations. On the contrary, human and animal brains are much less vulnerable. In contrast to the one-shot inference performed by most deep neural networks, the brain often solves decision-making with an evidence accumulation mechanism that may trade time for accuracy when facing noisy inputs. The mechanism is well described by the Drift-Diffusion Model (DDM). In the DDM, decision-making is modeled as a process in which noisy evidence is accumulated toward a threshold. Drawing inspiration from the DDM, we propose the Dropout-based Drift-Diffusion Model (DDDM) that combines test-phase dropout and the DDM for improving the robustness for arbitrary neural networks. The dropouts create temporally uncorrelated noises in the network that counter perturbations, while the evidence accumulation mechanism guarantees a reasonable decision accuracy. Neural networks enhanced with the DDDM tested in image, speech, and text classification tasks all significantly outperform their native counterparts, demonstrating the DDDM as a task-agnostic defense against adversarial attacks.

preprint2022arXiv

Delving into the Estimation Shift of Batch Normalization in a Network

Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.

preprint2022arXiv

Desingularization and p-Curvature of Recurrence Operators

Linear recurrence operators in characteristic $p$ are classified by their $p$-curvature. For a recurrence operator $L$, denote by $χ(L)$ the characteristic polynomial of its $p$-curvature. We can obtain information about the factorization of $L$ by factoring $χ(L)$. The main theorem of this paper gives an unexpected relation between $χ(L)$ and the true singularities of $L$. An application is to speed up a fast algorithm for computing $χ(L)$ by desingularizing $L$ first. Another contribution of this paper is faster desingularization.

preprint2022arXiv

DeTrust-FL: Privacy-Preserving Federated Learning in Decentralized Trust Setting

Federated learning has emerged as a privacy-preserving machine learning approach where multiple parties can train a single model without sharing their raw training data. Federated learning typically requires the utilization of multi-party computation techniques to provide strong privacy guarantees by ensuring that an untrusted or curious aggregator cannot obtain isolated replies from parties involved in the training process, thereby preventing potential inference attacks. Until recently, it was thought that some of these secure aggregation techniques were sufficient to fully protect against inference attacks coming from a curious aggregator. However, recent research has demonstrated that a curious aggregator can successfully launch a disaggregation attack to learn information about model updates of a target party. This paper presents DeTrust-FL, an efficient privacy-preserving federated learning framework for addressing the lack of transparency that enables isolation attacks, such as disaggregation attacks, during secure aggregation by assuring that parties' model updates are included in the aggregated model in a private and secure manner. DeTrust-FL proposes a decentralized trust consensus mechanism and incorporates a recently proposed decentralized functional encryption (FE) scheme in which all parties agree on a participation matrix before collaboratively generating decryption key fragments, thereby gaining control and trust over the secure aggregation process in a decentralized setting. Our experimental evaluation demonstrates that DeTrust-FL outperforms state-of-the-art FE-based secure multi-party aggregation solutions in terms of training time and reduces the volume of data transferred. In contrast to existing approaches, this is achieved without creating any trust dependency on external trusted entities.

preprint2022arXiv

Extended Load Flexibility of Industrial P2H Plants: A Process Constraint-Aware Scheduling Approach

The operational flexibility of industrial power-to-hydrogen (P2H) plants enables admittance of volatile renewable power and provides auxiliary regulatory services for the power grid. Aiming to extend the flexibility of the P2H plant further, this work presents a scheduling method by considering detailed process constraints of the alkaline electrolyzers. Unlike existing works that assume constant load range, the presented scheduling framework fully exploits the dynamic processes of the electrolyzer, including temperature and hydrogen-to-oxygen (HTO) crossover, to improve operational flexibility. Varying energy conversion efficiency under different load levels and temperature is also considered. The scheduling model is solved by proper mathematical transformation as a mixed-integer linear programming (MILP), which determines the on-off-standby states and power levels of different electrolyzers in the P2H plant for daily operation. With experiment-verified constraints, a case study show that compared to the existing scheduling approach, the improved flexibility leads to a 1.627% profit increase when the P2H plant is directly coupled to the photovoltaic power.

preprint2022arXiv

Extracting Densest Sub-hypergraph with Convex Edge-weight Functions

The densest subgraph problem (DSG) aiming at finding an induced subgraph such that the average edge-weights of the subgraph is maximized, is a well-studied problem. However, when the input graph is a hypergraph, the existing notion of DSG fails to capture the fact that a hyperedge partially belonging to an induced sub-hypergraph is also a part of the sub-hypergraph. To resolve the issue, we suggest a function $f_e:\mathbb{Z}_{\ge0}\rightarrow \mathbb{R}_{\ge 0}$ to represent the partial edge-weight of a hyperedge $e$ in the input hypergraph $\mathcal{H}=(V,\mathcal{E},f)$ and formulate a generalized densest sub-hypergraph problem (GDSH) as $\max_{S\subseteq V}\frac{\sum_{e\in \mathcal{E}}{f_e(|e\cap S|)}}{|S|}$. We demonstrate that, when all the edge-weight functions are non-decreasing convex, GDSH can be solved in polynomial-time by the linear program-based algorithm, the network flow-based algorithm and the greedy $\frac{1}{r}$-approximation algorithm where $r$ is the rank of the input hypergraph. Finally, we investigate the computational tractability of GDSH where some edge-weight functions are non-convex.

preprint2022arXiv

Generalized persistence of entropy weak solutions for system of hyperbolic conservation laws

Let $u(t,x)$ be the solution to the Cauchy problem of a scalar conservation law in one space dimension. It is well known that even for smooth initial data the solution can become discontinuous in finite time and global entropy weak solution can best lie in the space of bounded total variations. It is impossible that the solutions belong to ,for example ,$H^1$ because by Sobolev embedding theorem $H^1$ functions are H$\mathrm{\ddot{o}}$lder continuous. However, we note that from any point $(t,x)$ we can draw a generalized characteristic downward which meets the initial axis at $y=α(t,x)$. if we regard $u$ as a function of $(t,y)$, it indeed belongs to $H^1$ as a function of $y$ if the initial data belongs to $H^1$. We may call this generalized persistence (of high regularity) of the entropy weak solutions. The main purpose of this paper is to prove some kinds of generalized persistence (of high regularity) for the scalar and $2\times 2$ Temple system of hyperbolic conservation laws in one space dimension .

preprint2022arXiv

Inhomogeneous superconducting states in two weakly linked superconducting ultra thin films

A sufficiently large parallel magnetic field will generate staggered supercurrent loops and superfluid density wave in two weakly linked superconducting (SC) ultrathin films, resulting in an inhomogeneous Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state. The SC order parameter of such an FFLO state is characterized by Bloch wave functions, called the "Bloch SC state". The staggered supercurrent loops form an array of Josephson vortex-antivortex pairs, instead of the usual Josephson vortex lattice. Enclosing a unit cell of the array, the London's fluxoid is quantized as $Φ^{\prime}=Φ_0=hc/2e$, while the net orbital magnetization caused by the staggered supercurrent is zero. Meanwhile, a small parallel magnetic field gives rise to an Fulde-Ferrell (FF) state that has uniform superfluid density. The phase transition between the Bloch SC state and the FF state belongs to the universality class of two-dimensional commensurate-incommensurate transitions. An analytical solution in terms of Jacobian elliptic functions is found to be an excellent approximation to the Bloch SC order parameter.

preprint2022arXiv

Learning Visibility for Robust Dense Human Body Estimation

Estimating 3D human pose and shape from 2D images is a crucial yet challenging task. While prior methods with model-based representations can perform reasonably well on whole-body images, they often fail when parts of the body are occluded or outside the frame. Moreover, these results usually do not faithfully capture the human silhouettes due to their limited representation power of deformable models (e.g., representing only the naked body). An alternative approach is to estimate dense vertices of a predefined template body in the image space. Such representations are effective in localizing vertices within an image but cannot handle out-of-frame body parts. In this work, we learn dense human body estimation that is robust to partial observations. We explicitly model the visibility of human joints and vertices in the x, y, and z axes separately. The visibility in x and y axes help distinguishing out-of-frame cases, and the visibility in depth axis corresponds to occlusions (either self-occlusions or occlusions by other objects). We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates. We show that visibility can serve as 1) an additional signal to resolve depth ordering ambiguities of self-occluded vertices and 2) a regularization term when fitting a human body model to the predictions. Extensive experiments on multiple 3D human datasets demonstrate that visibility modeling significantly improves the accuracy of human body estimation, especially for partial-body cases. Our project page with code is at: https://github.com/chhankyao/visdb.

preprint2022arXiv

Matrix product states for Hartree-Fock-Bogoliubov wave functions

We provide an efficient and accurate method for converting Hartree-Fock-Bogoliubov wave functions into matrix product states (MPSs). These wave functions, also known as Bogoliubov vacua, exhibit a peculiar entanglement structure that the eigenvectors of the reduced density matrix are also Bogoliubov vacua. We exploit this important feature to obtain their optimal MPS approximation and derive an explicit formula for corresponding MPS matrices. The performance of our method is benchmarked with the Kitaev chain and the Majorana-Hubbard model on the honeycomb lattice. The approach facilitates the applications of Hartree-Fock-Bogoliubov wave functions and is ideally suited for combining with the density-matrix renormalization group method.

preprint2022arXiv

Plasma Image Classification Using Cosine Similarity Constrained CNN

Plasma jets are widely investigated both in the laboratory and in nature. Astrophysical objects such as black holes, active galactic nuclei, and young stellar objects commonly emit plasma jets in various forms. With the availability of data from plasma jet experiments resembling astrophysical plasma jets, classification of such data would potentially aid in investigating not only the underlying physics of the experiments but the study of astrophysical jets. In this work we use deep learning to process all of the laboratory plasma images from the Caltech Spheromak Experiment spanning two decades. We found that cosine similarity can aid in feature selection, classify images through comparison of feature vector direction, and be used as a loss function for the training of AlexNet for plasma image classification. We also develop a simple vector direction comparison algorithm for binary and multi-class classification. Using our algorithm we demonstrate 93% accurate binary classification to distinguish unstable columns from stable columns and 92% accurate five-way classification of a small, labeled data set which includes three classes corresponding to varying levels of kink instability.

preprint2022arXiv

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either do not preserve the privacy of agents or are not sample and communication-efficient. In this work, we develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient. In both algorithms, agents share noisy information to preserve privacy and adopt mini-batch updates to improve sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}\big(ε^{-2}\ln(ε^{-1})\big)$ and $\mathcal{O}\big(ε^{-3}\ln(ε^{-1})\big)$, respectively, and the same small communication complexity $\mathcal{O}\big(ε^{-1}\ln(ε^{-1})\big)$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithm.

preprint2022arXiv

Sense Embeddings are also Biased--Evaluating Social Biases in Static and Contextualised Sense Embeddings

Sense embedding learning methods learn different embeddings for the different senses of an ambiguous word. One sense of an ambiguous word might be socially biased while its other senses remain unbiased. In comparison to the numerous prior work evaluating the social biases in pretrained word embeddings, the biases in sense embeddings have been relatively understudied. We create a benchmark dataset for evaluating the social biases in sense embeddings and propose novel sense-specific bias evaluation measures. We conduct an extensive evaluation of multiple static and contextualised sense embeddings for various types of social biases using the proposed measures. Our experimental results show that even in cases where no biases are found at word-level, there still exist worrying levels of social biases at sense-level, which are often ignored by the word-level bias evaluation measures.

preprint2022arXiv

Single-shot Hyper-parameter Optimization for Federated Learning: A General Algorithm & Analysis

We address the relatively unexplored problem of hyper-parameter optimization (HPO) for federated learning (FL-HPO). We introduce Federated Loss SuRface Aggregation (FLoRA), a general FL-HPO solution framework that can address use cases of tabular data and any Machine Learning (ML) model including gradient boosting training algorithms and therefore further expands the scope of FL-HPO. FLoRA enables single-shot FL-HPO: identifying a single set of good hyper-parameters that are subsequently used in a single FL training. Thus, it enables FL-HPO solutions with minimal additional communication overhead compared to FL training without HPO. We theoretically characterize the optimality gap of FL-HPO, which explicitly accounts for the heterogeneous non-IID nature of the parties' local data distributions, a dominant characteristic of FL systems. Our empirical evaluation of FLoRA for multiple ML algorithms on seven OpenML datasets demonstrates significant model accuracy improvements over the considered baseline, and robustness to increasing number of parties involved in FL-HPO training.

preprint2022arXiv

Specificity-preserving RGB-D Saliency Detection

Salient object detection (SOD) on RGB and depth images has attracted more and more research interests, due to its effectiveness and the fact that depth cues can now be conveniently captured. Existing RGB-D SOD models usually adopt different fusion strategies to learn a shared representation from the two modalities (\ie, RGB and depth), while few methods explicitly consider how to preserve modality-specific characteristics. In this study, we propose a novel framework, termed SPNet} (Specificity-preserving network), which benefits SOD performance by exploring both the shared information and modality-specific properties (\eg, specificity). Specifically, we propose to adopt two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps, respectively. To effectively fuse cross-modal features in the shared learning network, we propose a cross-enhanced integration module (CIM) and then propagate the fused feature to the next layer for integrating cross-level information. Moreover, to capture rich complementary multi-modal information for boosting the SOD performance, we propose a multi-modal feature aggregation (MFA) module to integrate the modality-specific features from each individual decoder into the shared decoder. By using a skip connection, the hierarchical features between the encoder and decoder layers can be fully combined. Extensive experiments demonstrate that our~\ours~outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks. The project is publicly available at: https://github.com/taozh2017/SPNet.

preprint2022arXiv

Two-dimensional superconductivity at the surfaces of KTaO3 gated with ionic liquid

The recent observation of superconductivity at the interfaces between KTaO3 and EuO (or LaAlO3) offers a new example of emergent phenomena at oxide interfaces. This superconductivity exhibits an unusual strong dependence on the crystalline orientation of KTaO3 and its superconducting transition temperature Tc is nearly one order of magnitude higher than that of the seminal LaAlO3/SrTiO3 interface. To understand its mechanism, it is crucial to address if the formation of oxide interfaces is indispensable for the presence of superconductivity. Here, by exploiting ionic liquid (IL) gating, we obtain superconductivity at KTaO3 (111) and (110) surfaces with Tc up to 2.0 K and 1.0 K, respectively. This oxide-interface-free superconductivity gives a clear experimental evidence that the essential physics of KTaO3 interface superconductivity lies in the KTaO3 surfaces doped with electrons. Moreover, the ability to control superconductivity at surfaces with IL provides a simple way to study the intrinsic superconductivity in KTaO3.

preprint2022arXiv

UNISON: Unpaired Cross-lingual Image Captioning

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (i) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (ii) a cross-modal unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach.

preprint2022arXiv

Unveiling a critical stripy state in the triangular-lattice SU(4) spin-orbital model

The simplest spin-orbital model can host a nematic spin-orbital liquid state on the triangular lattice. We provide clear evidence that the ground state of the SU(4) Kugel-Khomskii model on the triangular lattice can be well described by a "single" Gutzwiller projected wave function with an emergent parton Fermi surface, despite it exhibits strong finite-size effect in quasi-one-dimensional cylinders. The finite-size effect can be resolved by the fact that the parton Fermi surface consists of open orbits in the reciprocal space. Thereby, a stripy liquid state is expected in the two-dimensional limit, which preserves the SU(4) symmetry while breaks the translational symmetry by doubling the unit cell along one of the lattice vector directions. It is indicative that these stripes are critical and the central charge is $c=3$, in agreement with the SU(4)$_1$ Wess-Zumino-Witten conformal field theory. All these results are consistent with the Lieb-Schultz-Mattis-Oshikawa-Hastings theorem.

preprint2021arXiv

Curse or Redemption? How Data Heterogeneity Affects the Robustness of Federated Learning

Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks. This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks. The initial impression driven by our experimental results suggests that data heterogeneity is the dominant factor in the effectiveness of attacks and it may be a redemption for defending against backdooring as it makes the attack less efficient, more challenging to design effective attack strategies, and the attack result also becomes less predictable. However, with further investigations, we found data heterogeneity is more of a curse than a redemption as the attack effectiveness can be significantly boosted by simply adjusting the client-side backdooring timing. More importantly,data heterogeneity may result in overfitting at the local training of benign clients, which can be utilized by attackers to disguise themselves and fool skewed-feature based defenses. In addition, effective attack strategies can be made by adjusting attack data distribution. Finally, we discuss the potential directions of defending the curses brought by data heterogeneity. The results and lessons learned from our extensive experiments and analysis offer new insights for designing robust federated learning methods and systems

preprint2021arXiv

Emergence of high-temperature superconductivity at the interface of two Mott insulators

Interfacial superconductivity has manifested itself in various types of heterostructures: band insulator-band insulator, band insulator-Mott insulator, and Mott insulator-metal. We report the observation of high-temperature superconductivity (HTS) in a complementary and long expected type of heterostructures, which consists of two Mott insulators, La2CuO4 (LCO) and PrBa2Cu3O7 (PBCO). By carefully controlling oxidization condition and selectively doping CuO2 planes with Fe atoms, which suppress superconductivity, we found that the superconductivity arises at the LCO side and is confined within no more than two unit cells (about 2.6 nm) near the interface. A phenomenon of overcome the Fe barrier will show up if excess oxygen is present during growth. Some possible mechanisms for the interfacial HTS have been discussed, and we attribute it to the redistribution of oxygen.

preprint2021arXiv

Event-based Motion Segmentation with Spatio-Temporal Graph Cuts

Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixelwise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely event cluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source licence to foster research in the emerging topic of event-based motion segmentation.

preprint2021arXiv

Global existence for semilinear wave equations with scaling invariant damping in 3-D

Global existence for small data Cauchy problem of semilinear wave equations with scaling invariant damping in 3-D is established in this work, assuming that the data are radial and the constant in front of the damping belongs to $[1.5, 2)$. The proof is based on a weighted $L^2-L^2$ estimate for inhomogeneous wave equation, which is established by interpolating between energy estimate and Morawetz type estimate.

preprint2021arXiv

Graph topology invariant gradient and sampling complexity for decentralized and stochastic optimization

One fundamental problem in decentralized multi-agent optimization is the trade-off between gradient/sampling complexity and communication complexity. We propose new algorithms whose gradient and sampling complexities are graph topology invariant while their communication complexities remain optimal. For convex smooth deterministic problems, we propose a primal dual sliding (PDS) algorithm that computes an $ε$-solution with $O((\tilde{L}/ε)^{1/2})$ gradient and $O((\tilde{L}/ε)^{1/2}+\|\mathcal{A}\|/ε)$ communication complexities, where $\tilde{L}$ is the smoothness parameter of the objective and $\mathcal{A}$ is related to either the graph Laplacian or the transpose of the oriented incidence matrix of the communication network. The results can be improved to $O((\tilde{L}/μ)^{1/2}\log(1/ε))$ and $O((\tilde{L}/μ)^{1/2}\log(1/ε) + \|\mathcal{A}\|/ε^{1/2})$ respectively with $μ$-strong convexity. We also propose a stochastic variant, the primal dual sliding (SPDS) algorithm for problems with stochastic gradients. The SPDS algorithm utilizes the mini-batch technique and enables the agents to perform sampling and communication simultaneously. It computes a stochastic $ε$-solution with $O((\tilde{L}/ε)^{1/2} + (σ/ε)^2)$ sampling complexity, which can be improved to $O((\tilde{L}/μ)^{1/2}\log(1/ε) + σ^2/ε)$ with strong convexity. Here $σ^2$ is the variance. The communication complexities of SPDS remain the same as that of the deterministic case. All the aforementioned gradient and sampling complexities match the lower complexity bounds for centralized convex smooth optimization and are independent of the network structure. To the best of our knowledge, these gradient and sampling complexities have not been obtained before for decentralized optimization over a constraint feasible set.

preprint2021arXiv

Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification

Chest X-rays are an important and accessible clinical imaging tool for the detection of many thoracic diseases. Over the past decade, deep learning, with a focus on the convolutional neural network (CNN), has become the most powerful computer-aided diagnosis technology for improving disease identification performance. However, training an effective and robust deep CNN usually requires a large amount of data with high annotation quality. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. Thus, existing public chest X-ray datasets usually adopt language pattern based methods to automatically mine labels from reports. However, this results in label uncertainty and inconsistency. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods from two perspectives to improve a single model's disease identification performance, rather than focusing on an ensemble of models. MODL integrates multiple models to obtain a soft label distribution for optimizing the single target model, which can reduce the effects of original label uncertainty. Moreover, KNNS aims to enhance the robustness of the target model to provide consistent predictions on images with similar medical findings. Extensive experiments on the public NIH Chest X-ray and CheXpert datasets show that our model achieves consistent improvements over the state-of-the-art methods.

preprint2021arXiv

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

The gradient descent-ascent (GDA) algorithm has been widely applied to solve minimax optimization problems. In order to achieve convergent policy parameters for minimax optimization, it is important that GDA generates convergent variable sequences rather than convergent sequences of function values or gradient norms. However, the variable convergence of GDA has been proved only under convexity geometries, and there lacks understanding for general nonconvex minimax optimization. This paper fills such a gap by studying the convergence of a more general proximal-GDA for regularized nonconvex-strongly-concave minimax optimization. Specifically, we show that proximal-GDA admits a novel Lyapunov function, which monotonically decreases in the minimax optimization process and drives the variable sequence to a critical point. By leveraging this Lyapunov function and the KŁ geometry that parameterizes the local geometries of general nonconvex functions, we formally establish the variable convergence of proximal-GDA to a critical point $x^*$, i.e., $x_t\to x^*, y_t\to y^*(x^*)$. Furthermore, over the full spectrum of the KŁ-parameterized geometry, we show that proximal-GDA achieves different types of convergence rates ranging from sublinear convergence up to finite-step convergence, depending on the geometry associated with the KŁ parameter. This is the first theoretical result on the variable convergence for nonconvex minimax optimization.

preprint2021arXiv

Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data

This paper presents a novel framework to build a voice conversion (VC) system by learning from a text-to-speech (TTS) synthesis system, that is called TTS-VC transfer learning. We first develop a multi-speaker speech synthesis system with sequence-to-sequence encoder-decoder architecture, where the encoder extracts robust linguistic representations of text, and the decoder, conditioned on target speaker embedding, takes the context vectors and the attention recurrent network cell output to generate target acoustic features. We take advantage of the fact that TTS system maps input text to speaker independent context vectors, and reuse such a mapping to supervise the training of latent representations of an encoder-decoder voice conversion system. In the voice conversion system, the encoder takes speech instead of text as input, while the decoder is functionally similar to TTS decoder. As we condition the decoder on speaker embedding, the system can be trained on non-parallel data for any-to-any voice conversion. During voice conversion training, we present both text and speech to speech synthesis and voice conversion networks respectively. At run-time, the voice conversion network uses its own encoder-decoder architecture. Experiments show that the proposed approach outperforms two competitive voice conversion baselines consistently, namely phonetic posteriorgram and variational autoencoder methods, in terms of speech quality, naturalness, and speaker similarity.

preprint2020arXiv

Accelerating Power Methods for Higher-order Markov Chains

Higher-order Markov chains play a very important role in many fields, ranging from multilinear PageRank to financial modeling. In this paper, we propose three accelerated higher-order power methods for computing the limiting probability distribution of higher-order Markov chains, namely higher-order power method with momentum and higher-order quadratic extrapolation method. The convergence results are established, and numerical experiments are reported to show the efficiency of the proposed algorithms. In particular, the non-parametric quadratic extrapolation method is very competitive, and outperforms state-of-the-art competitions.

preprint2020arXiv

An Investigation into the Stochasticity of Batch Whitening

Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs). We attribute this phenomenon to the stochasticity that BW introduces. We quantitatively investigate the stochasticity of different whitening transformations and show that it correlates well with the optimization behaviors during training. We also investigate how stochasticity relates to the estimation of population statistics during inference. Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. Our proposed BW algorithm improves the residual networks by a significant margin on ImageNet classification. Besides, we show that the stochasticity of BW can improve the GAN's performance with, however, the sacrifice of the training stability.

preprint2020arXiv

Chinese Named Entity Recognition Augmented with Lexicon Memory

Inspired by a concept of content-addressable retrieval from cognitive science, we propose a novel fragment-based model augmented with a lexicon-based memory for Chinese NER, in which both the character-level and word-level features are combined to generate better feature representations for possible name candidates. It is observed that locating the boundary information of entity names is useful in order to classify them into pre-defined categories. Position-dependent features, including prefix and suffix are introduced for NER in the form of distributed representation. The lexicon-based memory is used to help generate such position-dependent features and deal with the problem of out-of-vocabulary words. Experimental results showed that the proposed model, called LEMON, achieved state-of-the-art on four datasets.

preprint2020arXiv

Classical and quantum order in hyperkagome antiferromagnets

Motivated by recent experiments and density functional theory calculations on choloalite PbCuTe$_2$O$_6$, which possesses a Cu-based three-dimensional hyperkagome lattice, we propose and study a $J_1$-$J_2$-$J_3$ antiferromagnetic Heisenberg model on a hyperkagome lattice. In the classical limit, possible ground states are analyzed by two triangle rules, i.e., the "hyperkagome triangle rule" and the "isolated triangle rule," and classical Monte Carlo simulations are exploited to identify possible classical magnetic ordering and explore the phase diagram. In the quantum regime, Schwinger boson theory is applied to study possible quantum spin liquid states and long-range magnetically ordered states on an equal footing. These quantum states with bosonic partons are classified and analyzed by using projective symmetry groups (PSGs). It is found that there are only four types of algebraic PSGs allowed by the space group $P4_{1}32$ on a hyperkagome lattice. Moreover, there are only two types of PSGs that are compatible with the $J_1$-$J_2$-$J_3$ Heisenberg model. These two types of $Z_2$ bosonic states are distinguished by the gauge-invariant flux on the elementary ten-site loops on the hyperkagome network, called zero-flux state and $π$-flux state respectively. Both the zero-flux state and the $π$-flux state are able to give rise to quantum spin liquid states as well as magnetically ordered states, and the zero-flux states and the $π$-flux states can be distinguished by the lower and upper edges of the spectral function $S(\bm{q},ω)$, which can be measured by inelastic neutron scattering experiments.

preprint2020arXiv

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

Despite neural networks have achieved prominent performance on many natural language processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we propose Dirichlet Neighborhood Ensemble (DNE), a randomized smoothing method for training a robust model to defense substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models for NLP applications. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

preprint2020arXiv

Efficient tensor network representation for Gutzwiller projected states of paired fermions

Recent work by Wu {\em et al.} [arXiv:1910.11011] proposed a numerical method, so-called matrix product operator-matrix product state (MPO-MPS) method, by which several types of quantum many-body wave functions, in particular, the projected Fermi sea state, can be efficiently represented as a tensor network. In this paper, we generalize the MPO-MPS method to study Gutzwiller projected paired states of fermions, where the maximally localized Wannier orbitals for Bogoliubov quasiparticles/quasiholes have been adapted to improve the computational performance. The study of $SO(3)$-symmetric spin-1 chains reveals that this new method has better performance than variational Monte Carlo for gapped states and similar performance for gapless states. Moreover, we demonstrate that dynamic correlation functions can be easily evaluated by this method cooperating with other MPS-based accurate approaches, such as the Chebyshev MPS method.

preprint2020arXiv

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem. Under this formulation, predicate categories are treated as completely different classes. However, different from the object labels where different classes have explicit boundaries, predicates usually have overlaps in their semantic meanings. For example, sit\_on and stand\_on have common meanings in vertical relationships but different details of how these two objects are vertically placed. In order to leverage the inherent structures of the predicate categories, we propose to first build the language hierarchy and then utilize the Hierarchy Guided Feature Learning (HGFL) strategy to learn better region features of both the coarse-grained level and the fine-grained level. Besides, we also propose the Hierarchy Guided Module (HGM) to utilize the coarse-grained level to guide the learning of fine-grained level features. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin (up to $33\%$ relative gain) in terms of Recall@50 on the task of Scene Graph Generation in different datasets.

preprint2020arXiv

Generative Tweening: Long-term Inbetweening of 3D Human Motions

The ability to generate complex and realistic human body animations at scale, while following specific artistic constraints, has been a fundamental goal for the game and animation industry for decades. Popular techniques include key-framing, physics-based simulation, and database methods via motion graphs. Recently, motion generators based on deep learning have been introduced. Although these learning models can automatically generate highly intricate stylized motions of arbitrary length, they still lack user control. To this end, we introduce the problem of long-term inbetweening, which involves automatically synthesizing complex motions over a long time interval given very sparse keyframes by users. We identify a number of challenges related to this problem, including maintaining biomechanical and keyframe constraints, preserving natural motions, and designing the entire motion sequence holistically while considering all constraints. We introduce a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions, conditioned on keyframe constraints. This network uses a novel two-stage approach where it first predicts local motion in the form of joint angles, and then predicts global motion, i.e. the global path that the character follows. Since there are typically a number of possible motions that could satisfy the given user constraints, we also enable our network to generate a variety of outputs with a scheme that we call Motion DNA. This approach allows the user to manipulate and influence the output content by feeding seed motions (DNA) to the network. Trained with 79 classes of captured motion data, our network performs robustly on a variety of highly complex motion styles.

preprint2020arXiv

GFTE: Graph-based Financial Table Extraction

Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images, which are difficult to be extracted directly. In this paper, to facilitate deep learning based table extraction from unstructured digital files, we publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds and their corresponding structure representation in JSON. In addition, we propose a novel graph-based convolutional neural network model named GFTE as a baseline for future comparison. GFTE integrates image feature, position feature and textual feature together for precise edge prediction and reaches overall good results.

preprint2020arXiv

History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.

preprint2020arXiv

IBM Federated Learning: an Enterprise Framework White Paper V0.1

Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for the most common machine learning libraries. {\proj} enables data scientists to expand their scope from centralized to federated machine learning, minimizing the learning curve at the outset while also providing the flexibility to deploy to different compute environments and design custom fusion algorithms.

preprint2020arXiv

Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images

Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

preprint2020arXiv

Learning to Generate Diverse Dance Motions with Transformer

With the ongoing pandemic, virtual concerts and live events using digitized performances of musicians are getting traction on massive multiplayer online worlds. However, well choreographed dance movements are extremely complex to animate and would involve an expensive and tedious production process. In addition to the use of complex motion capture systems, it typically requires a collaborative effort between animators, dancers, and choreographers. We introduce a complete system for dance motion synthesis, which can generate complex and highly diverse dance sequences given an input music sequence. As motion capture data is limited for the range of dance motions and styles, we introduce a massive dance motion data set that is created from YouTube videos. We also present a novel two-stream motion transformer generative model, which can generate motion sequences with high flexibility. We also introduce new evaluation metrics for the quality of synthesized dance motions, and demonstrate that our system can outperform state-of-the-art methods. Our system provides high-quality animations suitable for large crowds for virtual concerts and can also be used as reference for professional animation pipelines. Most importantly, we show that vast online videos can be effective in training dance motion models.

preprint2020arXiv

Measurement of the neutron beam profile of the Back-n white neutron facility at CSNS with a Micromegas detector

The Back-n white neutron beam line, which uses back-streaming white neutrons from the spallation target of the China Spallation Neutron Source, is used for nuclear data measurements. A Micromegas-based neutron detector with two variants was specially developed to measure the beam spot distribution for this beam line. In this article, the design, fabrication, and characterization of the detector are described. The results of the detector performance tests are presented, which include the relative electron transparency, the gain and the gain uniformity, and the neutron beam profile reconstruction capability. The result of the first measurement of the Back-n neutron beam spot distribution is also presented.

preprint2020arXiv

Momentum with Variance Reduction for Nonconvex Composition Optimization

Composition optimization is widely-applied in nonconvex machine learning. Various advanced stochastic algorithms that adopt momentum and variance reduction techniques have been developed for composition optimization. However, these algorithms do not fully exploit both techniques to accelerate the convergence and are lack of convergence guarantee in nonconvex optimization. This paper complements the existing literature by developing various momentum schemes with SPIDER-based variance reduction for non-convex composition optimization. In particular, our momentum design requires less number of proximal mapping evaluations per-iteration than that required by the existing Katyusha momentum. Furthermore, our algorithm achieves near-optimal sample complexity results in both non-convex finite-sum and online composition optimization and achieves a linear convergence rate under the gradient dominant condition. Numerical experiments demonstrate that our algorithm converges significantly faster than existing algorithms in nonconvex composition optimization.

preprint2020arXiv

Nanoscale structure detection and monitoring of tumour growth with optical coherence tomography

Approximately 90% of cancers have their origins in epithelial tissues and this leads to epithelial thickening, but the ultrastructural changes and underlying architecture is less well known. Depth resolved label free visualization of nanoscale tissue morphology is required to reveal the extent and distribution of ultrastructural changes in underlying tissue, but is difficult to achieve with existing imaging modalities. We developed a nanosensitive optical coherence tomography (nsOCT) approach to provide such imaging based on dominant axial structure with a few nanometre detection accuracy. nsOCT maps the distribution of axial structural sizes an order of magnitude smaller than the axial resolution of the system. We validated nsOCT methodology by detecting synthetic axial structure via numerical simulations. Subsequently, we validated the nsOCT technique experimentally by detecting known structures from a commercially fabricated sample. nsOCT reveals scaling with different depth of dominant submicron structural changes associated with carcinoma which may inform the origins of the disease, its progression and improve diagnosis.

preprint2020arXiv

Nanosensitive optical coherence tomography to assess wound healing within the cornea

Optical Coherence Tomography (OCT) is a non-invasive depth resolved optical imaging modality, that enables high resolution, cross-sectional imaging in biological tissues and materials at clinically relevant depths. Though OCT offers high resolution imaging, the best ultra-high-resolution OCT systems are limited to imaging structural changes with a resolution of one micron on a single B-scan within very limited depth. Nanosensitive OCT (nsOCT) is a recently developed technique that is capable of providing enhanced sensitivity of OCT to structural changes. Improving the sensitivity of OCT to detect structural changes at the nanoscale level, to a depth typical for conventional OCT, could potentially improve the diagnostic capability of OCT in medical applications. In this paper, we demonstrate the capability of nsOCT to detect structural changes deep in the rat cornea following superficial corneal injury.

preprint2020arXiv

Non-invasive detection of nanoscale structural changes in cornea associated with cross-linking treatment

Corneal cross-linking (CXL) using UVA irradiation with a riboflavin photosensitizer has grown from an interesting concept to a practical clinical treatment for corneal ectatic diseases globally, such as keratoconus. To characterize the corneal structural changes, existing methods such as X-ray microscopy, transmission electron microscopy (TEM), histology and optical coherence tomography have been used. However, these methods have various drawbacks such as invasive detection, the impossibility for in vivo measurement, or limited resolution and sensitivity to structural alterations. Here, we report the application of over-sampling nano-sensitive optical coherence tomography (nsOCT) method for probing the corneal structural alterations. The results indicate that the spatial period increases slightly after 30 minutes riboflavin instillation but decreases significantly after 30 min UVA irradiation following the Dresden protocol. The proposed non-invasive method can be implemented using existing OCT system, without any additional components, for detecting nanoscale changes with the potential to assist diagnostic assessment during CXL treatment, and possibly to be a real-time monitoring tool in clinics.

preprint2020arXiv

On the Continuity of Rotation Representations in Neural Networks

In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

preprint2020arXiv

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (named APG-restart) for solving nonconvex and nonsmooth problems. Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the effectiveness of our proposed algorithm.

preprint2020arXiv

Reanalysis of Variance Reduced Temporal Difference Learning

Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was proposed by Korda and La (2015), which applies the variance reduction technique directly to the online TD learning with Markovian samples. In this work, we first point out the technical errors in the analysis of VRTD in Korda and La (2015), and then provide a mathematically solid analysis of the non-asymptotic convergence of VRTD and its variance reduction performance. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate. Furthermore, the variance error (for both i.i.d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD. As a result, the overall computational complexity of VRTD to attain a given accurate solution outperforms that of TD under Markov sampling and outperforms that of TD under i.i.d.\ sampling for a sufficiently small conditional number.

preprint2020arXiv

Small-floating Target Detection in Sea Clutter via Visual Feature Classifying in the Time-Doppler Spectra

It is challenging to detect small-floating object in the sea clutter for a surface radar. In this paper, we have observed that the backscatters from the target brake the continuity of the underlying motion of the sea surface in the time-Doppler spectra (TDS) images. Following this visual clue, we exploit the local binary pattern (LBP) to measure the variations of texture in the TDS images. It is shown that the radar returns containing target and those only having clutter are separable in the feature space of LBP. An unsupervised one-class support vector machine (SVM) is then utilized to detect the deviation of the LBP histogram of the clutter. The outiler of the detector is classified as the target. In the real-life IPIX radar data sets, our visual feature based detector shows favorable detection rate compared to other three existing approaches.

preprint2020arXiv

Spatio-temporal Attention Model for Tactile Texture Recognition

Recently, tactile sensing has attracted great interest in robotics, especially for facilitating exploration of unstructured environments and effective manipulation. A detailed understanding of the surface textures via tactile sensing is essential for many of these tasks. Previous works on texture recognition using camera based tactile sensors have been limited to treating all regions in one tactile image or all samples in one tactile sequence equally, which includes much irrelevant or redundant information. In this paper, we propose a novel Spatio-Temporal Attention Model (STAM) for tactile texture recognition, which is the very first of its kind to our best knowledge. The proposed STAM pays attention to both spatial focus of each single tactile texture and the temporal correlation of a tactile sequence. In the experiments to discriminate 100 different fabric textures, the spatially and temporally selective attention has resulted in a significant improvement of the recognition accuracy, by up to 18.8%, compared to the non-attention based models. Specifically, after introducing noisy data that is collected before the contact happens, our proposed STAM can learn the salient features efficiently and the accuracy can increase by 15.23% on average compared with the CNN based baseline approach. The improved tactile texture perception can be applied to facilitate robot tasks like grasping and manipulation.

preprint2020arXiv

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of $\mathcal{O}(\min\{n^{1/2}ε^{-2},ε^{-3}\})$ in composite nonconvex optimization, improving the state-of-the-art result by a factor of $\mathcal{O}(\min\{n^{1/6},ε^{-1/3}\})$. We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.

preprint2020arXiv

The Complexity of the Partition Coloring Problem

Given a simple undirected graph $G=(V,E)$ and a partition of the vertex set $V$ into $p$ parts, the \textsc{Partition Coloring Problem} asks if we can select one vertex from each part of the partition such that the chromatic number of the subgraph induced on the $p$ selected vertices is bounded by $k$. PCP is a generalized problem of the classical \textsc{Vertex Coloring Problem} and has applications in many areas, such as scheduling and encoding etc. In this paper, we show the complexity status of the \textsc{Partition Coloring Problem} with three parameters: the number of colors, the number of parts of the partition, and the maximum size of each part of the partition. Furthermore, we give a new exact algorithm for this problem.

preprint2020arXiv

TiFL: A Tier-based Federated Learning System

Federated Learning (FL) enables learning a shared model across many clients without violating the privacy requirements. One of the key attributes in FL is the heterogeneity that exists in both resource and data due to the differences in computation and communication capacity, as well as the quantity and content of data among different clients. We conduct a case study to show that heterogeneity in resource and data has a significant impact on training time and model accuracy in conventional FL systems. To this end, we propose TiFL, a Tier-based Federated Learning System, which divides clients into tiers based on their training performance and selects clients from the same tier in each training round to mitigate the straggler problem caused by heterogeneity in resource and data quantity. To further tame the heterogeneity caused by non-IID (Independent and Identical Distribution) data and resources, TiFL employs an adaptive tier selection approach to update the tiering on-the-fly based on the observed training performance and accuracy overtime. We prototype TiFL in a FL testbed following Google's FL architecture and evaluate it using popular benchmarks and the state-of-the-art FL benchmark LEAF. Experimental evaluation shows that TiFL outperforms the conventional FL in various heterogeneous conditions. With the proposed adaptive tier selection policy, we demonstrate that TiFL achieves much faster training performance while keeping the same (and in some cases - better) test accuracy across the board.

preprint2020arXiv

Timing Performance of a Micro-Channel-Plate Photomultiplier Tube

The spatial dependence of the timing performance of the R3809U-50 Micro-Channel-Plate PMT (MCP-PMT) by Hamamatsu was studied in high energy muon beams. Particle position information is provided by a GEM tracker telescope, while timing is measured relative to a second MCP-PMT, identical in construction. In the inner part of the circular active area (radius r$<$5.5\,mm) the time resolution of the two MCP-PMTs combined is better than 10~ps. The signal amplitude decreases in the outer region due to less light reaching the photocathode, resulting in a worse time resolution. The observed radial dependence is in quantitative agreement with a dedicated simulation. With this characterization, the suitability of MCP-PMTs as $\text{t}_\text{0}$ reference detectors has been validated.

preprint2020arXiv

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle

Although SGD with random reshuffle has been widely-used in machine learning applications, there is a limited understanding of how model characteristics affect the convergence of the algorithm. In this work, we introduce model incoherence to characterize the diversity of model characteristics and study its impact on convergence of SGD with random reshuffle under weak strong convexity. Specifically, minimizer incoherence measures the discrepancy between the global minimizers of a sample loss and those of the total loss and affects the convergence error of SGD with random reshuffle. In particular, we show that the variable sequence generated by SGD with random reshuffle converges to a certain global minimizer of the total loss under full minimizer coherence. The other curvature incoherence measures the quality of condition numbers of the sample losses and determines the convergence rate of SGD. With model incoherence, our results show that SGD has a faster convergence rate and smaller convergence error under random reshuffle than those under random sampling, and hence provide justifications to the superior practical performance of SGD with random reshuffle.

preprint2019arXiv

Evidence for nematic superconductivity of topological surface states in PbTaSe2

Spontaneous symmetry breaking has been a paradigm to describe the phase transitions in condensed matter physics. In addition to the continuous electromagnetic gauge symmetry, an unconventional superconductor can break discrete symmetries simultaneously, such as time reversal and lattice rotational symmetry. In this work we report a characteristic in-plane 2-fold behaviour of the resistive upper critical field and point-contact spectra on the superconducting semimetal PbTaSe2 with topological nodal-rings, despite its hexagonal lattice symmetry (or D_3h in bulk while C_3v on surface, to be precise). However, we do not observe any lattice rotational symmetry breaking signal from field-angle-dependent specific heat. It is worth noting that such surface-only electronic nematicity is in sharp contrast to the observation in the topological superconductor candidate, CuxBi2Se3, where the nematicity occurs in various bulk measurements. In combination with theory, superconducting nematicity is likely to emerge from the topological surface states of PbTaSe2, rather than the proximity effect. The issue of time reversal symmetry breaking is also addressed. Thus, our results on PbTaSe2 shed new light on possible routes to realize nematic superconductivity with nontrivial topology.

preprint2019arXiv

Extensive beam test study of prototype MRPCs for the T0 detector at the CSR external-target experiment

The CSR External-target Experiment (CEE) will be the first large-scale nuclear physics experiment device at the Cooling Storage Ring (CSR) of the Heavy-Ion Research Facility in Lanzhou (HIRFL) in China. A new T0 detector has been proposed to measure the multiplicity, angular distribution and timing information of charged particles produced in heavy-ion collisions at the target region. Multi-gap resistive plate chamber (MRPC) technology was chosen as part of the construction of the T0 detector, which provides precision event collision times (T0) and collision geometry information. The prototype was tested with hadron and heavy-ion beams to study its performance. By comparing the experimental results with a Monte Carlo simulation, the time resolution of the MRPCs are found to be $\sim$ 50 ps or better. The timing performance of the T0 detector, including both detector and readout electronics, we found to fulfil the requirements of the CEE.

preprint2019arXiv

Formation of finite-time singularities for nonlinear elastodynamics with small initial disturbances

This article concerns the formation of finite-time singularities in solutions to quasilinear hyperbolic systems with small initial data. By constructing a special test function, we first present a simpler proof of the main result in Sideris&#39; &#34;Formation of singularities in three-dimensional compressible fluids&#34;: the global classical solution is non-existent for compressible Euler equation even for some small initial data. Then we apply this approach to nonlinear elastodynamics and magnetohydrodynamics, showing that the classical solutions to these equations can still blow up in finite time even if the initial data is small enough.

preprint2019arXiv

High Speed Mid-Infrared Interband Cascade Photodetector Based on InAs/GaSb Type-II Superlattice

High speed mid-wave infrared (MWIR) photodetectors have applications in the areas such as free space optical communication and frequency comb spectroscopy. However, most of the research on the MWIR photodetectors is focused on how to increase the quantum efficiency and reduce the dark current, in order to improve the detectivity (D*), and the 3dB bandwidth performance of the corresponding MWIR photodetectors is still not fully studied. In this work, we report and characterize a MWIR interband cascade photodetector based on InAs/GaSb type-II superlattice with a 50% cutoff wavelength at ~5.3 um at 300 K. The 3 dB cutoff frequency is 2.4 GHz at 300 K, for a 40 μm circular diameter device under -5 V applied bias. Limitations on the detector high speed performance are also discussed.High speed mid-wave infrared (MWIR) photodetectors have applications in the areas such as free space optical communication and frequency comb spectroscopy. However, most of the research on the MWIR photodetectors is focused on how to increase the quantum efficiency and reduce the dark current, in order to improve the detectivity (D*), and the 3dB bandwidth performance of the corresponding MWIR photodetectors is still not fully studied. In this work, we report and characterize a MWIR interband cascade photodetector based on InAs/GaSb type-II superlattice with a 50% cutoff wavelength at ~5.3 um at 300 K. The 3 dB cutoff frequency is 2.4 GHz at 300 K, for a 40 um circular diameter device under -5 V applied bias. Limitations on the detector high speed performance are also discussed.

preprint2019arXiv

On some conjectures by Lu and Wenzel

In order to give a unified generalization of the BW inequality and the DDVV inequality, Lu and Wenzel proposed three Conjectures 1, 2, 3 and an open Question 1 in 2016. In this paper we discuss further these conjectures and put forward several new conjectures which will be shown equivalent to Conjecture 2. In particular, we prove Conjecture 2 and hence all conjectures in some special cases. For Conjecture 3, we obtain a bigger upper bound $2+\sqrt{10}/2$, and we also give a weaker answer for the more general Question 1. In addition, we obtain some new simple proofs of the complex BW inequality and the condition for equality.

preprint2019arXiv

Superconductivity, pair density wave, and Neel order in cuprates

We investigate in underdoped cuprates possible coexistence of the superconducting (SC) order at zero momentum and pair density wave (PDW) at momentum ${\bf Q}=(π, π)$ in the presence of a Neel order. By symmetry, the $d$-wave uniform singlet pairing $dS_0$ can coexist with the $d$-wave triplet PDW $dT_{\bf Q}$, and the $p$-wave singlet PDW $pS_{\bf Q}$ can coexist with the $p$-wave uniform triplet $pT_0$. At half filling, we find the novel $pS_{\bf Q}+pT_0$ state is energetically more favorable than the $dS_0+dT_{\bf Q}$ state. At finite doping, however, the $dS_0+dT_{\bf Q}$ state is more favorable. In both types of states, the variational triplet parameters, $dT_{\bf Q}$ and $pT_0$, are of secondary significance. Our results point to a fully symmetric $\mathrm{Z_2}$ quantum spin liquid with spinon Fermi surface in proximity to the Neel order at zero doping, and to intertwined $d$-wave triplet PDW fluctuations and spin moment fluctuations along with the dominant $d$-wave singlet SC at finite doping. The results are obtained by variational quantum Monte Carlo simulations.

preprint2019arXiv

Supervised Encoding for Discrete Representation Learning

Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). The SEQ applies a quantizer to cluster and classify the encoded features. We found that the quantizer provides an interpretable graph where each cluster in the graph represents a class of data samples that have a particular style. We also trained a decoder that can decode convex combinations of the encoded features from similar and different clusters and provide guidance on style transfer between sub-classes.

preprint2016arXiv

The effect of in-plane magnetic field and applied strain in quantum spin Hall systems: application to InAs/GaSb quantum wells

Motivated by the recent discovery of quantized spin Hall effect in InAs/GaSb quantum wells\cite{du2013}$^,$\cite{xu2014}, we theoretically study the effects of in-plane magnetic field and strain effect to the quantization of charge conductance by using Landauer-Butikker formalism. Our theory predicts a robustness of the conductance quantization against the magnetic field up to a very high field of 20 tesla. We use a disordered hopping term to model the strain and show that the strain may help the quantization of the conductance. Relevance to the experiments will be discussed.

preprint2016arXiv

Theory for Spin Selective Andreev Reflection in Vortex Core of Topological Superconductor: Majorana Zero Modes on Spherical Surface and Application to Spin Polarized Scanning Tunneling Microscope Probe

Majorana zero modes (MZMs) have been predicted to exist in the topological insulator (TI)/superconductor (SC) heterostructure. Recent spin polarized scanning tunneling microscope (STM) experiment$^{1}$ has observed spin-polarization dependence of the zero bias differential tunneling conductance at the center of vortex core, which may be attributed to the spin selective Andreev reflection, a novel property of the MZMs theoretically predicted in 1-dimensional nanowire$^{2}$. Here we consider a helical electron system described by a Rashba spin orbit coupling Hamiltonian on a spherical surface with a s-wave superconducting pairing due to proximity effect. We examine in-gap excitations of a pair of vortices with one at the north pole and the other at the south pole. While the MZM is not a spin eigenstate, the spin wavefunction of the MZM at the center of the vortex core, r = 0, is parallel to the magnetic field, and the local Andreev reflection of the MZM is spin selective, namely occurs only when the STM tip has the spin polarization parallel to the magnetic field, similar to the case in 1-dimensional nanowire2. The total local differential tunneling conductance consists of the normal term proportional to the local density of states and an additional term arising from the Andreev reflection. We also discuss the finite size effect, for which the MZM at the north pole is hybridized with the MZM at the south pole. We apply our theory to examine the recently reported spin-polarized STM experiments and show good agreement with the experiments.