Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
40works
0followers
23topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

40 published item(s)

preprint2026arXiv

Asymptotic Convergence and Stability of Adaptive Gradient Methods in Smooth Non-convex Optimization

Adaptive gradient methods, such as AdaGrad, have become fundamental tools in deep learning. Despite their widespread use, the asymptotic convergence of AdaGrad remains poorly understood in non-convex scenarios. In this work, we present the first rigorous asymptotic convergence analysis of AdaGrad-Norm for smooth non-convex optimization. Using a novel stopping-time partitioning technique, we establish a key stability result: the objective function values remain bounded in expectation, and the iterates are bounded almost surely under a mild coercivity assumption. Building on these stability results, we prove that AdaGrad-Norm achieves both almost sure and mean-square convergence. Furthermore, we extend our analysis to RMSProp and show that, with appropriate hyperparameter choices, it also enjoys stability and asymptotic convergence. The techniques developed herein may be of independent interest for analyzing other adaptive stochastic optimization algorithms.

preprint2026arXiv

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout by synthesizing complex, longitudinal cancer Electronic Health Records (EHRs). However, their implementation in this critical field faces three primary challenges: the inability to effectively process the extensive length and fragmented nature of patient records for accurate temporal analysis; a heightened risk of clinical hallucination, as conventional grounding techniques such as Retrieval-Augmented Generation (RAG) do not adequately incorporate process-oriented clinical guidelines; and unreliable evaluation metrics that hinder the validation of AI systems in oncology. To address these issues, we propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records. The framework operates by transforming unstructured, longitudinal EHRs into patient-specific Temporal Knowledge Graphs (TKGs) to capture long-range dependencies, and then grounding the decision support process by aligning these real-world patient trajectories with a normative guideline knowledge graph. This approach provides oncologists with evidence-grounded decision support by generating a high-fidelity clinical summary and an actionable recommendation. We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset. In these settings, CliCARE significantly outperforms baselines, including leading long-context LLMs and Knowledge Graph-enhanced RAG methods. The clinical validity of our results is supported by a robust evaluation protocol, which demonstrates a high correlation with assessments made by oncologists.

preprint2026arXiv

Concordia: Self-Improving Synthetic Tables for Federated LLMs

Federated learning (FL) enables training large language models (LLMs) without sharing raw data, but adapting LLMs under strict data isolation and non-IID client distributions remains challenging in practice. Synthetic data offers a natural privacy-preserving surrogate for local training, yet existing federated pipelines typically treat synthetic generation as static or loosely coupled with downstream optimization, leading to rapidly diminishing utility under heterogeneous clients. We study federated adaptation of LLMs on tabular tasks where raw records and validation data cannot be shared, and local training must rely entirely on synthetic tables. We propose Concordia, a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite these constraints. At the client level, models are adapted via parameter-efficient LoRA training on synthetic tables. Clients additionally learn lightweight utility scorers from private validation feedback to reweight synthetic samples during local training. At the outer level, each client refines its own synthetic table generator using group-relative policy optimization (GRPO), guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare demonstrate that Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift compared to static and decoupled synthetic-data baselines.

preprint2026arXiv

CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.

preprint2026arXiv

Herculean: An Agentic Benchmark for Financial Intelligence

As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative workflows, including Trading, Hedging, Market Insights, and Auditing. Each workflow is instantiated as a standardized MCP-based skill environment with its own tools, interaction dynamics, constraints, and success criteria, enabling consistent end-to-end assessment of heterogeneous agent systems. Across frontier agents, we find agents perform relatively well on Trading and Market Insights, but struggle substantially on Hedging and Auditing, where long-horizon coordination, state consistency, and structured verification are critical. Overall, our results point to a key gap in current agents in turning financial reasoning into dependable workflow execution in high-stakes financial workflows.

preprint2026arXiv

PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning

Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks without a central coordinator. Standard multimodal pipelines learn a single shared embedding across all modalities. In DFL, such a monolithic representation induces gradient misalignment between uni- and multimodal agents; as a result, it suppresses heterogeneous sharing and cross-modal interaction. We present PARSE, a multimodal DFL framework that operationalizes partial information decomposition (PID) in a server-free setting. Each agent performs feature fission to factorize its latent representation into redundant, unique, and synergistic slices. P2P knowledge sharing among heterogeneous agents is enabled by slice-level partial alignment: only semantically shareable branches are exchanged among agents that possess the corresponding modality. By removing the need for central coordination and gradient surgery, PARSE resolves uni-/multimodal gradient conflicts, thereby overcoming the multimodal DFL dilemma while remaining compatible with standard DFL constraints. Across benchmarks and agent mixes, PARSE yields consistent gains over task-, modality-, and hybrid-sharing DFL baselines. Ablations on fusion operators and split ratios, together with qualitative visualizations, further demonstrate the efficiency and robustness of the proposed design.

preprint2026arXiv

Polarizable atomic multipoles for learning long-range electrostatics

Long-range electrostatics and polarization remain central obstacles to extending machine learning interatomic potentials (MLIPs) to ionic, polar, and interfacial systems. Here, we introduce a semi-local framework for learning electrostatics from energies and forces using polarizable atomic multipoles. Local equivariant descriptors predict environment-dependent latent monopoles, dipoles, and quadrupoles, while residual non-local charge transfer and polarization are captured by non-self-consistent linear response in induced charges and dipoles. Across four diverse benchmarks and four short-range MLIP architectures, the multipole hierarchy and response terms systematically improve potential energy surface accuracy, with the largest gains in systems where long-range effects are essential. More importantly, the learned latent variables recover physically meaningful electrical responses: accurate Born effective charge tensors, emergent polarizabilities, infrared spectra in close agreement with experiments, and semi-quantitative Raman spectra for bulk water and hybrid MAPbI$_3$ perovskite. This systematically improvable, physically transparent framework enables MLIPs trained on standard energy and force labels to predict polarization-sensitive observables.

preprint2026arXiv

Sampling non-log-concave densities via Hessian-free high-resolution dynamics

We study the problem of sampling from a target distribution $π(q)\propto e^{-U(q)}$ on $\mathbb{R}^d$, where $U$ can be non-convex, via the Hessian-free high-resolution (HFHR) dynamics, which is a second-order Langevin-type process that has $e^{-U(q)-\frac12|p|^2}$ as its unique invariant distribution, and it reduces to kinetic Langevin dynamics (KLD) as the resolution parameter $α\to0$. The existing theory for HFHR dynamics in the literature is restricted to strongly-convex $U$, although numerical experiments are promising for non-convex settings as well. We focus on studying the convergence of HFHR dynamics when $U$ can be non-convex, which bridges a gap between theory and practice. Under a standard assumption of dissipativity and smoothness on $U$, we adopt the reflection/synchronous coupling method. This yields a Lyapunov-weighted Wasserstein distance in which the HFHR semigroup is exponentially contractive for all sufficiently small $α>0$ whenever KLD is. We further show that, under an additional assumption that asymptotically $\nabla U$ has linear growth at infinity, the contraction rate for HFHR dynamics is strictly better than that of KLD, with an explicit gain. As a case study, we verify the assumptions and the resulting acceleration for three examples: a multi-well potential, Bayesian linear regression with $L^p$ regularizer and Bayesian binary classification. We conduct numerical experiments based on these examples, as well as an additional example of Bayesian logistic regression with real data processed by the neural networks, which illustrates the efficiency of the algorithms based on HFHR dynamics and verifies the acceleration and superior performance compared to KLD.

preprint2026arXiv

Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling

Recent advances in foundation models have enabled conversational agents that aim for sustained companionship rather than mere task completion. Yet most still remain unable to support natural, long-term companion-like interactions, resulting in experiences that feel episodic and inauthentic. We argue that current agents overlooked cross-temporal modeling of agents' social behaviors and internal emotions: generated behaviors rarely influence an agent's emotional state, and emotional states seldom shape subsequent behaviors. We present Cross-Temporal Emotion Modeling (CTEM), a framework that links long-term behavioral history to moment-to-moment emotional expression. CTEM establishes a closed loop where past experiences update an evolving emotional state; this state conditions immediate interactions; and user feedback continually revises both memory and emotional state, enabling reflection and anticipation. We instantiate CTEM as Auri, a companion agent on an instant-messaging platform, and report a 21-day in-the-wild study showing that CTEM shows improvements in perceived naturalness, coherence, and emotional harmony.

preprint2025arXiv

Fully First-Order Methods for Decentralized Bilevel Optimization

This paper focuses on decentralized stochastic bilevel optimization (DSBO) where agents only communicate with their neighbors. We propose Decentralized Stochastic Gradient Descent and Ascent with Gradient Tracking (DSGDA-GT), a novel algorithm that only requires first-order oracles that are much cheaper than second-order oracles widely adopted in existing works. We further provide a finite-time convergence analysis showing that for $n$ agents collaboratively solving the DSBO problem, the sample complexity of finding an $ε$-stationary point in our algorithm is $\mathcal{O}(n^{-1}ε^{-7})$, which matches the currently best-known results of the single-agent counterpart with linear speedup. The numerical experiments demonstrate both the communication and training efficiency of our algorithm.

preprint2023arXiv

Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics

Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products.

preprint2022arXiv

Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search

We propose a generic feature compression method for Approximate Nearest Neighbor Search (ANNS) problems, which speeds up existing ANNS methods in a plug-and-play manner. Specifically, based on transformer, we propose a new network structure to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy. Specifically, we use multiple compression projections to cast the feature into many low dimensional spaces, and then use transformer to globally optimize these projections such that the features are well compressed following the guidance from our loss function. The loss function is designed to assign high weights on point pairs that are close in original feature space, and keep their distances in projected space. Keeping these distances helps maintain the eventual top-k retrieval accuracy, and down weighting others creates room for feature compression. In experiments, we run our compression method on public datasets, and use the compressed features in graph based, product quantization and scalar quantization based ANNS solutions. Experimental results show that our compression method can significantly improve the efficiency of these methods while preserves or even improves search accuracy, suggesting its broad potential impact on real world applications.

preprint2022arXiv

FaceMap: Towards Unsupervised Face Clustering via Map Equation

Face clustering is an essential task in computer vision due to the explosion of related applications such as augmented reality or photo album management. The main challenge of this task lies in the imperfectness of similarities among image feature representations. Given an existing feature extraction model, it is still an unresolved problem that how can the inherent characteristics of similarities of unlabelled images be leveraged to improve the clustering performance. Motivated by answering the question, we develop an effective unsupervised method, named as FaceMap, by formulating face clustering as a process of non-overlapping community detection, and minimizing the entropy of information flows on a network of images. The entropy is denoted by the map equation and its minimum represents the least description of paths among images in expectation. Inspired by observations on the ranked transition probabilities in the affinity graph constructed from facial images, we develop an outlier detection strategy to adaptively adjust transition probabilities among images. Experiments with ablation studies demonstrate that FaceMap significantly outperforms existing methods and achieves new state-of-the-arts on three popular large-scale datasets for face clustering, e.g., an absolute improvement of more than $10\%$ and $4\%$ comparing with prior unsupervised and supervised methods respectively in terms of average of Pairwise F-score. Our code is publicly available on github.

preprint2022arXiv

Implementation of an Automated Learning System for Non-experts

Automated machine learning systems for non-experts could be critical for industries to adopt artificial intelligence to their own applications. This paper detailed the engineering system implementation of an automated machine learning system called YMIR, which completely relies on graphical interface to interact with users. After importing training/validation data into the system, a user without AI knowledge can label the data, train models, perform data mining and evaluation by simply clicking buttons. The paper described: 1) Open implementation of model training and inference through docker containers. 2) Implementation of task and resource management. 3) Integration of Labeling software. 4) Implementation of HCI (Human Computer Interaction) with a rebuilt collaborative development paradigm. We also provide subsequent case study on training models with the system. We hope this paper can facilitate the prosperity of our automated machine learning community from industry application perspective. The code of the system has already been released to GitHub (https://github.com/industryessentials/ymir).

preprint2022arXiv

Improving Adversarial Robustness via Mutual Information Estimation

Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.

preprint2022arXiv

Lifted Bregman Training of Neural Networks

We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions. This formulation is based on Bregman distances and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions. Instead of estimating the parameters with a combination of first-order optimisation method and back-propagation (as is the state-of-the-art), we propose the use of non-smooth first-order optimisation methods that exploit the specific structure of the novel formulation. We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding compared to more conventional training frameworks.

preprint2022arXiv

Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation

In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation. Although previous efforts have been extensively put on combining audio and visual modalities, most of them solely adopt a straightforward concatenation of audio and visual features. To exploit the real useful information behind these two modalities, we define two key correlations which are: (1) identity correlation (between timbre and facial attributes); (2) phonetic correlation (between phoneme and lip motion). These two correlations together comprise the complete information, which shows a certain superiority in separating target speaker's voice especially in some hard cases, such as the same gender or similar content. For implementation, contrastive learning or adversarial training approach is applied to maximize these two correlations. Both of them work well, while adversarial training shows its advantage by avoiding some limitations of contrastive learning. Compared with previous research, our solution demonstrates clear improvement on experimental metrics without additional complexity. Further analysis reveals the validity of the proposed architecture and its good potential for future extension.

preprint2022arXiv

Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification

Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. A fundamental challenge of MIL on WSI classification is to discover the \textit{critical instances} that trigger the bag label. However, previous methods are primarily designed under the independent and identical distribution hypothesis (\textit{i.i.d}), ignoring either the correlations between instances or heterogeneity of tumours. In this paper, we propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above. Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM) and assisted by the memory-based contrastive loss during training. Firstly, IQGM gives the probability of instances and generates the internal query (IQ) for the subsequent MDM by aggregating highly reliable features after the distribution analysis. Secondly, the multiplex-detection cross-attention (MDCA) and multi-head self-attention (MHSA) in MDM cooperate to generate the final representations for the WSI. In this process, the IQ and trainable variational query (VQ) successfully build up the connections between instances and significantly improve the model's robustness toward heterogeneous tumours. At last, to further enforce constraints in the feature space and stabilize the training process, we adopt a memory-based contrastive loss, which is practicable for WSI classification even with a single sample as input in each iteration. We conduct experiments on three computational pathology datasets, e.g., CAMELYON16, TCGA-NSCLC, and TCGA-RCC datasets. The superior accuracy and AUC demonstrate the superiority of our proposed MDMIL over other state-of-the-art methods.

preprint2022arXiv

On Uniform Boundedness Properties of SGD and its Momentum Variants

A theoretical, and potentially also practical, problem with stochastic gradient descent is that trajectories may escape to infinity. In this note, we investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant. Under smoothness and $R$-dissipativity of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values. Several important applications that satisfy these assumptions, including phase retrieval problems, Gaussian mixture models, and some neural network classifiers, are discussed in detail. We further extend the uniform boundedness of SGD and its momentum variant under the generalized dissipativity for the functions whose tails grow slower than quadratic functions. This includes some interesting applications, for example, Bayesian logistic regression and logistic regression with $\ell_1$ regularization.

preprint2022arXiv

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5 times parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance. Source code is available at https://github.com/hkzhang91/ParC-Net

preprint2022arXiv

Semi-parametric Makeup Transfer via Semantic-aware Correspondence

The large discrepancy between the source non-makeup image and the reference makeup image is one of the key challenges in makeup transfer. Conventional approaches for makeup transfer either learn disentangled representation or perform pixel-wise correspondence in a parametric way between two images. We argue that non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies. To this end, this paper proposes a \textbf{S}emi-\textbf{p}arametric \textbf{M}akeup \textbf{T}ransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms. The non-parametric component is a novel \textbf{S}emantic-\textbf{a}ware \textbf{C}orrespondence (SaC) module that explicitly reconstructs content representation with makeup representation under the strong constraint of component semantics. The reconstructed representation is desired to preserve the spatial and identity information of the source image while "wearing" the makeup of the reference image. The output image is synthesized via a parametric decoder that draws on the reconstructed representation. Extensive experiments demonstrate the superiority of our method in terms of visual quality, robustness, and flexibility. Code and pre-trained model are available at \url{https://github.com/AnonymScholar/SpMT.

preprint2022arXiv

Spin-selective tunneling from nanowires of the candidate topological Kondo insulator SmB6

Incorporating relativistic physics into quantum tunneling can lead to exotic behavior such as perfect transmission via Klein tunneling. Here, we probe the tunneling properties of spin-momentum locked relativistic fermions by designing and implementing a tunneling geometry that utilizes nanowires of the topological Kondo insulator candidate, SmB6. The nanowires are attached to the end of scanning tunneling microscope tips, and used to image the bicollinear stripe spin-order in the antiferromagnet Fe1.03Te with a Neel temperature of ~50 K. The antiferromagnetic stripes become invisible above 10 K concomitant with the suppression of the topological surface states. We further demonstrate that the direction of spin-polarization is tied to the tunneling direction. Our technique establishes SmB6 nanowires as ideal conduits for spin-polarized currents.

preprint2022arXiv

The Microscopic Diamond Anvil Cell: Stabilization of Superhard, Superconducting Carbon Allotropes at Ambient Pressure

A metallic covalently bonded carbon allotrope is predicted via first principles calculations. It is composed of an $sp^3$ carbon framework that acts as a diamond anvil cell by constraining the distance between parallel cis-polyacetylene chains. The distance between these $sp^2$ carbon atoms renders the phase metallic, and yields two well-nested nearly parallel bands that span the Fermi level. Calculations show that this phase is a conventional superconductor, with the motions of the $sp^2$ carbons being key contributors to the electron phonon coupling. The $sp^3$ carbon atoms impart superior mechanical properties, with a predicted Vickers hardness of 48~GPa. This phase, metastable at ambient conditions, could be made via cold compression of graphite to 40~GPa. A family of multifunctional materials with tunable superconducting and mechanical properties could be derived from this phase by varying the $sp^2$ versus $sp^3$ carbon content and by doping.

preprint2022arXiv

Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Only parts of unlabeled data are selected to train models for most semi-supervised learning methods, whose confidence scores are usually higher than the pre-defined threshold (i.e., the confidence margin). We argue that the recognition performance should be further improved by making full use of all unlabeled data. In this paper, we learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition. All unlabeled samples are partitioned into two subsets by comparing their confidence scores with the adaptively learned confidence margin at each training epoch: (1) subset I including samples whose confidence scores are no lower than the margin; (2) subset II including samples whose confidence scores are lower than the margin. For samples in subset I, we constrain their predictions to match pseudo labels. Meanwhile, samples in subset II participate in the feature-level contrastive objective to learn effective facial expression features. We extensively evaluate Ada-CM on four challenging datasets, showing that our method achieves state-of-the-art performance, especially surpassing fully-supervised baselines in a semi-supervised manner. Ablation study further proves the effectiveness of our method. The source code is available at https://github.com/hangyu94/Ada-CM.

preprint2022arXiv

Zeroth-Order Randomized Subspace Newton Methods

Zeroth-order methods have become important tools for solving problems where we have access only to function evaluations. However, the zeroth-order methods only using gradient approximations are $n$ times slower than classical first-order methods for solving n-dimensional problems. To accelerate the convergence rate, this paper proposes the zeroth order randomized subspace Newton (ZO-RSN) method, which estimates projections of the gradient and Hessian by random sketching and finite differences. This allows us to compute the Newton step in a lower dimensional subspace, with small computational costs. We prove that ZO-RSN can attain lower iteration complexity than existing zeroth order methods for strongly convex problems. Our numerical experiments show that ZO-RSN can perform black-box attacks under a more restrictive limit on the number of function queries than the state-of-the-art Hessian-aware zeroth-order method.

preprint2021arXiv

A Little Bit of Carbon Can do a Lot for Superconductivity in H$_3$S

First-principles calculations were carried out to provide a chemical basis for proposed structures associated with the recently reported room-temperature superconductivity in a carbonaceous sulfur hydride material under pressure. Calculations were performed on supercells of H$_3$S doped with 1.85-25\% carbon, corresponding to SH$_3$~$\rightarrow$~CH$_3$ or SH$_3$~$\rightarrow$~CH$_4$ substitutions, primarily at pressures of 270 GPa where the maximum critical temperature, $T_c$, has been reported. In the first type of substitution, the carbon atoms can be six-fold coordinated, stabilizing a CH$_6$ configuration within the cubic H$_3$S framework structure that forms under pressure. In the second, the carbon can be four-fold coordinated as methane intercalated into the H-S lattice, with or without an additional hydrogen in the framework. The results indicate that unusual local bonding configurations with respect to carbon can be stabilized under pressure. The doping breaks degenerate bands, lowering the density of states at the Fermi level ($N_F$), and localizing electrons in C-H bonds. Low levels of CH$_4$ doping do not increase $N_F$ to values as high as those calculated for $Im\bar{3}m$ H$_3$S, but they do result in a larger logarithmic average phonon frequency, and an electron-phonon coupling parameter comparable to that of $R3m$ H$_3$S. The $T_c$s estimated for carbon doping levels ranging from 1.85-5.7\% are compatible with experimental measurements for the C-S-H superconductor.

preprint2021arXiv

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide the convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an $\mathcal{O}(\ln T/\sqrt{T})$ rate. We also provide the convergence guarantees for general (possibly non-smooth) convex problems, ensuring an $\mathcal{O}(\ln T/\sqrt{T})$ convergence rate. Finally, in the strongly convex case, we establish an $\mathcal{O}(\ln T/T)$ rate for smooth problems, which we also prove to be tight, and an $\mathcal{O}(\ln^2 T /T)$ rate without the smoothness assumption. We illustrate the practical efficiency of the step decay step-size in several large scale deep neural network training tasks.

preprint2021arXiv

SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization

To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. SEPAL applies the NLP technique and employs and trains a wide&deep model to quickly and precisely predict whether one rule is unregulated or not.Our evaluation shows SEPAL is effective, practical and scalable. We verify SEPAL outperforms the state of the art approach (i.e., EASEAndroid) by 15% accuracy rate on average. In our experiments, SEPAL successfully identifies 7,111 unregulated policy rules with a low false positive rate from 595,236 customized rules (extracted from 774 Android firmware images of 72 manufacturers). We further discover the policy customization problem is getting worse in newer Android versions (e.g., around 8% for Android 7 and nearly 20% for Android 9), even though more and more efforts are made. Then, we conduct a deep study and discuss why the unregulated rules are introduced and how they can compromise user devices. Last, we report some unregulated rules to seven vendors and so far four of them confirm our findings.

preprint2020arXiv

A diagnosis of explicit symmetry breaking in the tight-binding constructions for symmetry-protected topological systems

It has been well established that for symmetry protected topological systems, the non-trivial topology prevents a real space representation using exponentially localized Wannier wavefunctions (WFs) in all directions, unless the protecting symmetry is sacrificed as an on-site transformation. This makes it challenging to determine the symmetry of various physical observables represented using such WFs. In this work, we propose a practical method for overcoming such challenges using the Kane-Mele model as a concrete example. We present a systematic procedure for diagnosing the symmetry of any observables, as well as a method for constructing symmetric operators up to arbitrary truncation accuracy.

preprint2020arXiv

A Metastable CaSH$_3$ Phase Composed of HS Honeycomb Sheets that is Superconducting Under Pressure

Evolutionary searches predicted a number of ternary phases that could be synthesized at pressures of 100-300~GPa. $P6_3/mmc$ CaSH$_2$, $Pnma$ CaSH$_2$, $Cmc2_1$ CaSH$_6$, and $I\bar{4}$ CaSH$_{20}$ were composed of a Ca-S lattice along with H$_2$ molecules coordinated in a ``side-on'' fashion to Ca. The H-H bond lengths in these semiconducting phases were elongated because of H$_2$ $σ\rightarrow$ Ca d donation, and Ca d$\rightarrow$ H$_2$ $σ^*$ back-donation, via a Kubas-like mechanism. $P\bar{6}m2$ CaSH$_3$, consisting of two-dimensional HS and CaH$_2$ sheets, was metastable and metallic above \edit{128~GPa}. The presence of van Hove singularities increased its density of states at the Fermi level, and concomitantly the superconducting critical temperature, which was estimated to be as high as \edit{$\sim$100~K at 128~GPa}. This work will inspire the search for superconductivity in materials based upon \edit{honeycomb HX (X=S, Se, Te), and MH$_2$ (M=Mg, Ca, Sr, Ba) layers} under pressure.

preprint2020arXiv

A Simple and Effective Framework for Pairwise Deep Metric Learning

Deep metric learning (DML) has received much attention in deep learning due to its wide applications in computer vision. Previous studies have focused on designing complicated losses and hard example mining methods, which are mostly heuristic and lack of theoretical understanding. In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar. It identifies the most critical issue in this problem--imbalanced data pairs. To tackle this issue, we propose a simple and effective framework to sample pairs in a batch of data for updating the model. The key to this framework is to define a robust loss for all pairs over a mini-batch of data, which is formulated by distributionally robust optimization. The flexibility in constructing the uncertainty decision set of the dual variable allows us to recover state-of-the-art complicated losses and also to induce novel variants. Empirical studies on several benchmark data sets demonstrate that our simple and effective method outperforms the state-of-the-art results. Codes are available at: https://github.com/qiqi-helloworld/A-Simple-and-Effective-Framework-for-Pairewise-Distance-Metric-Learning

preprint2020arXiv

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}.

preprint2020arXiv

Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped Langevin diffusion, one gets a non-reversible diffusion that converges to the same stationary distribution with a faster convergence rate. In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion. We provide finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems. Our results lead to non-asymptotic guarantees for both population and empirical risk minimization problems. Numerical experiments for Bayesian independent component analysis and neural network models show that NSGLD can outperform SGLD with proper choices of the anti-symmetric matrix.

preprint2020arXiv

Robust Partial Matching for Person Search in the Wild

Various factors like occlusions, backgrounds, etc., would lead to misaligned detected bounding boxes , e.g., ones covering only portions of human body. This issue is common but overlooked by previous person search works. To alleviate this issue, this paper proposes an Align-to-Part Network (APNet) for person detection and re-Identification (reID). APNet refines detected bounding boxes to cover the estimated holistic body regions, from which discriminative part features can be extracted and aligned. Aligned part features naturally formulate reID as a partial feature matching procedure, where valid part features are selected for similarity computation, while part features on occluded or noisy regions are discarded. This design enhances the robustness of person search to real-world challenges with marginal computation overhead. This paper also contributes a Large-Scale dataset for Person Search in the wild (LSPS), which is by far the largest and the most challenging dataset for person search. Experiments show that APNet brings considerable performance improvement on LSPS. Meanwhile, it achieves competitive performance on existing person search benchmarks like CUHK-SYSU and PRW.

preprint2020arXiv

Stochastic Optimization for Non-convex Inf-Projection Problems

In this paper, we study a family of non-convex and possibly non-smooth inf-projection minimization problems, where the target objective function is equal to minimization of a joint function over another variable. This problem include difference of convex (DC) functions and a family of bi-convex functions as special cases. We develop stochastic algorithms and establish their first-order convergence for finding a (nearly) stationary solution of the target non-convex function under different conditions of the component functions. To the best of our knowledge, this is the first work that comprehensively studies stochastic optimization of non-convex inf-projection minimization problems with provable convergence guarantee. Our algorithms enable efficient stochastic optimization of a family of non-decomposable DC functions and a family of bi-convex functions. To demonstrate the power of the proposed algorithms we consider an important application in variance-based regularization. Experiments verify the effectiveness of our inf-projection based formulation and the proposed stochastic algorithm in comparison with previous stochastic algorithms based on the min-max formulation for achieving the same effect.

preprint2020arXiv

Strong pairing in two dimensions: Pseudogaps, domes, and other implications

This paper addresses the transition from the normal to the superfluid state in strongly correlated two dimensional fermionic superconductors and Fermi gases. We arrive at the Berezinskii-Kosterlitz-Thouless (BKT) temperature $T_{\text{BKT}}$ as a function of \emph{attractive} pairing strength by associating it with the onset of "quasi-condensation" in the normal phase. Our approach builds on a criterion for determining the BKT transition temperature for atomic gases which is based on a well established Quantum Monte Carlo analysis of the phase space density. This latter quantity, when derived from BCS-BEC crossover theory for fermions, leads to non-monotonic behavior for $T_{\text{BKT}}$ as a function of the attractive interaction or inverse scattering length. In Fermi gases, this implies a robust superconducting dome followed by a long tail from the flat BEC asymptote, rather similar to what is observed experimentally. For lattice systems we find that $T_{\text{BKT}}$ has an absolute maximum of the order of $0.1 E_F$. We discuss how our results compare with those derived from the Nelson Kosterlitz criterion based on the mean field superfluid density and the approach to the transition from below. While there is agreement in the strict mean-field BCS regime at weak coupling, we find that at moderate pairing strength bosonic excitations cause a substantial increase in $T_\text{BKT}$ followed by an often dramatic decrease before the system enters the BEC regime.

preprint2020arXiv

The Collins asymmetry in electroproduction of Kaon at the electron ion colliders within TMD factorization

We apply the transverse momentum dependent factorization formalism to investigate the transverse single spin dependent Collins asymmetry with a $\sin(ϕ_h+ϕ_s)$ modulation in the semi-inclusive production of Kaon meson in deep inelastic scattering process. The asymmetry is contributed by the convolutions of the transversity distribution function $h_1(x)$ of the target proton and the Collins function of the Kaon in the final state. We adopt the available parametrization of $h_1(x)$ as well as the recent extracted result for the Kaon Collins function. To perform the transverse momentum dependent evolution, the parametrization of the nonperturbative Sudakov form factor of the proton and final state Kaon are utilized. We numerically predict the Collins asymmetry for charged Kaon production at the electron ion colliders within the accuracy of next-to-leading-logarithmic order. It is found that the asymmetry is sizable and could be measured. We emphasize the importance of planned electron ion colliders in the aspect of constraining sea quark distribution functions as well as accessing the information of the nucleon spin structure and the hadronization mechanism.

preprint2020arXiv

Weakly Supervised Temporal Action Localization with Segment-Level Labels

Temporal action localization presents a trade-off between test performance and annotation-time cost. Fully supervised methods achieve good performance with time-consuming boundary annotations. Weakly supervised methods with cheaper video-level category label annotations result in worse performance. In this paper, we introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here. We incorporate this segment-level supervision along with a novel localization module in the training. Specifically, we devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments. Since the labeled segments are only parts of actions, the model tends to overfit along with the training process. To tackle this problem, we first obtain a similarity matrix from discriminative features guided by a sphere loss. Then, a propagation loss is devised based on the matrix to act as a regularization term, allowing implicit unlabeled segments propagation during training. Experiments validate that our method can outperform the video-level supervision methods with almost same the annotation time.

preprint2019arXiv

Nematic state in CeAuSb$_{2}$

At ambient pressure and zero field, tetragonal CeAuSb$_{2}$ hosts stripe antiferromagnetic order at $T_{N} = 6.3$ K. Here we first show via bulk thermodynamic probes and x-ray diffraction measurements that this magnetic order is connected with a structural phase transition to a superstructure which likely breaks $C_{4}$ symmetry, thus signaling nematic order. The temperature-field-pressure phase diagram of CeAuSb$_{2}$ subsequently reveals the emergence of additional ordered states under applied pressure at a multicritical point. Our phenomenological model supports the presence of a vestigial nematic phase in CeAuSb$_{2}$ akin to iron-based high-temperature superconductors; however, superconductivity, if present, remains to be discovered.

preprint2019arXiv

The $\sin (2ϕ-ϕ_S)$ azimuthal asymmetry in the pion induced Drell-Yan process within TMD factorization

We investigate the single transverse-spin asymmetry with a $\sin (2ϕ-ϕ_S)$ modulation in the pion-induced Drell-Yan process within the theoretical framework of the transverse momentum dependent factorization. The asymmetry is contributed by the convolution of the Boer-Mulders function and the transversity. We adopt the model results for the distributions of the pion meson from the light-cone wavefunction approach and the available parametrization for the distributions of the proton to numerically estimate the $\sin (2ϕ-ϕ_S)$ asymmetry in $π^- p$ Drell-Yan at the kinematics of the COMPASS at CERN. To implement the TMD evolution formalism of parton distribution functions, we apply the recently extracted nonperturbative Sudakov form factor associated with the distribution functions of the proton and the pion. It is found that our prediction on the single transverse-spin dependent asymmetry ${\sin (2ϕ-ϕ_S)}$ as functions of $x_p$, $x_π$, $x_F$ and $q_\perp$ is qualitatively consistent with the recent COMPASS measurement in both sign and magnitude.