Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
57works
0followers
33topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

57 published item(s)

preprint2026arXiv

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

Visual encoding constitutes a major computational bottleneck in Multimodal Large Language Models (MLLMs), especially for high-resolution image inputs. The prevailing practice typically adopts global encoding followed by post-ViT compression. Global encoding produces massive token sequences, while post-ViT compression incurs the full quadratic attention cost of the ViT before any token reduction takes place. In this work, we revisit this convention along two dimensions: the encoding strategy and visual token compression. First, controlled experiments show that slice-based encoding outperforms global encoding across benchmarks, suggesting that preserving local details through sliced views can be more beneficial than applying global attention for fine-grained perception. Second, we introduce intra-ViT early compression, which reduces tokens in shallow ViT layers and substantially lowers visual-encoding FLOPs while preserving downstream performance. By integrating intra-ViT compression into the slice-based encoding framework, we present LLaVA-UHD v4, an efficient and compute-controllable visual encoding scheme tailored for high-resolution inputs. Across a diverse set of benchmarks covering document understanding, OCR, and general VQA, LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% while matching or even surpassing baseline performance. These results suggest that visual-encoding efficiency can be substantially improved without sacrificing downstream performance, providing a practical design direction for efficient high-resolution MLLMs. All model weights and code will be publicly released to support further research.

preprint2026arXiv

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

preprint2026arXiv

Uncertainty Quantification for LLM-based Code Generation

Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.

preprint2024arXiv

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs. However, Large Multimodal Models (LMMs) encounter two issues in such scenarios: (1) a lack of fine-grained perception, and (2) a tendency to blend information across multiple images. We first extensively investigate the capability of LMMs to perceive fine-grained visual details when dealing with multiple input images. The research focuses on two aspects: first, image-to-image matching (to evaluate whether LMMs can effectively reason and pair relevant images), and second, multi-image-to-text matching (to assess whether LMMs can accurately capture and summarize detailed image information). We conduct evaluations on a range of both open-source and closed-source large models, including GPT-4V, Gemini, OpenFlamingo, and MMICL. To enhance model performance, we further develop a Contrastive Chain-of-Thought (CoCoT) prompting approach based on multi-input multimodal models. This method requires LMMs to compare the similarities and differences among multiple image inputs, and then guide the models to answer detailed questions about multi-image inputs based on the identified similarities and differences. Our experimental results showcase CoCoT's proficiency in enhancing the multi-image comprehension capabilities of large multimodal models.

preprint2024arXiv

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

We present En3D, an enhanced generative scheme for sculpting high-quality 3D human avatars. Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalanced viewing angles and imprecise pose priors, our approach aims to develop a zero-shot 3D generative scheme capable of producing visually realistic, geometrically accurate and content-wise diverse 3D humans without relying on pre-existing 3D or 2D assets. To address this challenge, we introduce a meticulously crafted workflow that implements accurate physical modeling to learn the enhanced 3D generative model from synthetic 2D data. During inference, we integrate optimization modules to bridge the gap between realistic appearances and coarse 3D shapes. Specifically, En3D comprises three modules: a 3D generator that accurately models generalizable 3D humans with realistic appearance from synthesized balanced, diverse, and structured human images; a geometry sculptor that enhances shape quality using multi-view normal constraints for intricate human anatomy; and a texturing module that disentangles explicit texture maps with fidelity and editability, leveraging semantical UV partitioning and a differentiable rasterizer. Experimental results show that our approach significantly outperforms prior works in terms of image quality, geometry accuracy and content diversity. We also showcase the applicability of our generated avatars for animation and editing, as well as the scalability of our approach for content-style free adaptation.

preprint2023arXiv

Duality viewpoint of criticality

In this work, we study quantum many-body systems which are self-dual under duality transformation connecting different symmetry protected topological (SPT) phases. We provide a geometric explanation of the criticality of these self-dual models. More precisely, we show a ground state (quasi-)degeneracy under the periodic boundary conditions,i.e., the ingappability of the bulk spectrum. Equivalently, the symmetry group at criticality, including the duality symmetry, has a mixed 't Hooft anomaly. This approach can not only predict the spectrum of the self-dual model with ordinary 0-form symmetry, but also be applied to that with generalized symmetry, such as higher form and subsystem symmetry. As an application, we illustrate our results with several examples in one and two dimensions, which separate two different SPTs.

preprint2023arXiv

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

A special cross-tie domain wall in helimagnet

A special cross-tie (SCT) domain wall was discovered in the helimagnet MnCoSi alloy via the magnetic vector field tomography in Lorentz transmission electron microscopy (LTEM). Different to the traditional cross-tie (TCT) domain wall where the convergent/divergent magnetic moment configuration line up one by one, the relative large Bloch type sub-walls emerge in this brand-new SCT domain wall and two mutually perpendicular rotation axes coexist in this special feature. The straight magnetic stripes accompanied with the unraveled domain walls hint the complex mechanism to form this SCT structure. Interestingly, different orientation of this domain wall in LTEM can easily exhibit various magnetic features, including meron/antimeron chains or bimeron strings.

preprint2022arXiv

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

Holographic Multiple-Input and Multiple-Output (MIMO) is envisioned as a promising technology to realize unprecedented spectral efficiency by integrating a large number of antennas into a compact space. Most research on holographic MIMO is based on isotropic scattering environments, and the antenna gain is assumed to be unlimited by deployment space. However, the channel might not satisfy isotropic scattering because of generalized angle distributions, and the antenna gain is limited by the array aperture in reality. In this letter, we aim to analyze the holographic MIMO channel capacity under practical angle distribution and array aperture constraints. First, we calculate the spectral density for generalized angle distributions by introducing a wavenumber domain-based method. And then, the capacity under generalized angle distributions is analyzed and two different aperture schemes are considered. Finally, numerical results show that the capacity is obviously affected by angle distribution at high signal-to-noise ratio (SNR) but hardly affected at low SNR, and the capacity will not increase infinitely with antenna density due to the array aperture constraint.

preprint2022arXiv

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor by checking the ratio of prediction changes after applying the learned patch on the low-confidence data. Extensive evaluations on five backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

preprint2022arXiv

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural language in image data, facilitating a broad variety of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VL-PTMs for downstream tasks. To address the challenge, we present Cross-modal Prompt Tuning (CPT, alternatively, Colorful Prompt Tuning), a novel paradigm for tuning VL-PTMs, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT enables strong few-shot and even zero-shot visual grounding capabilities of VL-PTMs. Comprehensive experimental results show that the prompt-tuned VL-PTMs outperform their fine-tuned counterparts by a large margin (e.g., 17.3% absolute accuracy improvement, and 73.8% relative standard deviation reduction on average with one shot in RefCOCO evaluation). We make the data and code for this paper publicly available at https://github.com/thunlp/CPT.

preprint2022arXiv

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations.

preprint2022arXiv

Detecting Topology Attacks against Graph Neural Networks

Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.

preprint2022arXiv

Equiangular lines with a fixed angle

Solving a longstanding problem on equiangular lines, we determine, for each given fixed angle and in all sufficiently large dimensions, the maximum number of lines pairwise separated by the given angle. Fix $0 < α< 1$. Let $N_α(d)$ denote the maximum number of lines through the origin in $\mathbb{R}^d$ with pairwise common angle $\arccos α$. Let $k$ denote the minimum number (if it exists) of vertices in a graph whose adjacency matrix has spectral radius exactly $(1-α)/(2α)$. If $k < \infty$, then $N_α(d) = \lfloor k(d-1)/(k-1) \rfloor$ for all sufficiently large $d$, and otherwise $N_α(d) = d + o(d)$. In particular, $N_{1/(2k-1)}(d) = \lfloor k(d-1)/(k-1) \rfloor$ for every integer $k\ge 2$ and all sufficiently large $d$. A key ingredient is a new result in spectral graph theory: the adjacency matrix of a connected bounded degree graph has sublinear second eigenvalue multiplicity.

preprint2022arXiv

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces

The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets&#39; in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.

preprint2022arXiv

Fine-Grained Scene Graph Generation with Data Transfer

Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.

preprint2022arXiv

From Cascades to $J$-holomorphic Curves and Back

This paper develops the analysis needed to set up a Morse-Bott version of embedded contact homology (ECH) of a contact three-manifold in certain cases. In particular we establish a correspondence between &#34;cascades&#34; of holomorphic curves in the symplectization of a Morse-Bott contact form, and holomorphic curves in the symplectization of a nondegenerate perturbation of the contact form. The cascades we consider must be transversely cut out and rigid. We accomplish this by studying the adiabatic degeneration of $J$-holomorphic curves into cascades and establishing a gluing theorem. We note our gluing theorem satisfying appropriate transversality hypotheses should work in higher dimensions as well. The details of ECH applications will appear elsewhere.

preprint2022arXiv

Gappability Index for Quantum Many-Body Systems

We propose an index $\mathcal{I}_G$ which characterizes the degree of ingappability, namely the difficulty to induce a unique ground state with a nonvanishing excitation gap, in the presence of a symmetry $G$. $\mathcal{I}_G$ represents the dimension of the subspace of ambient uniquely-gapped in the entire $G$-invariant &#34;theory space&#34;. The celebrated Lieb-Schultz-Mattis theorem corresponds, in our formulation, to the case $\mathcal{I}_G=0$ (completely ingappable) for the symmetry $G$ including the lattice translation symmetry. We illustrate the usefulness of the index by discussing the phase diagram of spin-$1/2$ antiferromagnets in various dimensions, which do not necessarily have the translation symmetry.

preprint2022arXiv

Gate-Level Side-Channel Leakage Assessment with Architecture Correlation Analysis

While side-channel leakage is traditionally evaluated from a fabricated chip, it is more time-efficient and cost-effective to do so during the design phase of the chip. We present a methodology to rank the gates of a design according to their contribution to the side-channel leakage of the chip. The methodology relies on logic synthesis, logic simulation, gate-level power estimation, and gate leakage assessment to compute a ranking. The ranking metric can be defined as a specific test by correlating gate-level activity with a leakage model, or else as a non-specific test by evaluating gate-level activity in response to distinct test vector groups. Our results show that only a minority of the gates in a design contribute most of the side-channel leakage. We demonstrate this property for several designs, including a hardware AES coprocessor and a cryptographic hardware/software interface in a five-stage pipelined RISC processor.

preprint2022arXiv

Generative Adversarial Networks for Robust Cryo-EM Image Denoising

The cryo-electron microscopy (Cryo-EM) becomes popular for macromolecular structure determination. However, the 2D images which Cryo-EM detects are of high noise and often mixed with multiple heterogeneous conformations or contamination, imposing a challenge for denoising. Traditional image denoising methods can not remove Cryo-EM image noise well when the signal-noise-ratio (SNR) of images is meager. Thus it is desired to develop new effective denoising techniques to facilitate further research such as 3D reconstruction, 2D conformation classification, and so on. In this paper, we approach the robust image denoising problem in Cryo-EM by a joint Autoencoder and Generative Adversarial Networks (GAN) method. Equipped with robust $\ell_1$ Autoencoder and some designs of robust $β$-GANs, one can stabilize the training of GANs and achieve the state-of-the-art performance of robust denoising with low SNR data and against possible information contamination. The method is evaluated by both a heterogeneous conformational dataset on the Thermus aquaticus RNA Polymerase (RNAP) and a homogenous dataset on the Plasmodium falciparum 80S ribosome dataset (EMPIRE-10028), in terms of Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), as well as heterogeneous conformation clustering. These results suggest that our proposed methodology provides an effective tool for Cryo-EM 2D image denoising. Our code is available in &#34;https://github.com/ghl1995/denoise-gan-in-cryo-em&#34;.

preprint2022arXiv

Geometric approach to Lieb-Schultz-Mattis theorem without translation symmetry under inversion or rotation symmetry

We propose a geometric {approach to Lieb-Schultz-Mattis theorem for} quantum many-body systems with discrete spin-rotation symmetries and lattice inversion or rotation symmetry, but without translation symmetry assumed. Under symmetry-twisting on a $(d-1)$-dimensional plane, we find that any $d$-dimensional inversion-symmetric spin system possesses a doubly degenerate spectrum when it hosts a half-integer spin at the inversion-symmetric point. We also show that any rotation-symmetric generalized spin model with a projective representation at the rotation center has a similar degeneracy under symmetry-twisting. We argue that these degeneracies imply that {a unique symmetric gapped ground state that is smoothly connected to product states} is forbidden in the original untwisted systems -- generalized inversional/rotational Lieb-Schultz-Mattis theorems without lattice translation symmetry imposed. The traditional Lieb-Schultz-Mattis theorems with translations also fit in the proposed framework.

preprint2022arXiv

Observation of short-period helical spin order and magnetic transition in a non-chiral centrosymmetric helimagnet

The search for materials exhibiting nanoscale spiral order continues to be fuelled by the promise of emergent inductors. Although such spin textures have been reported in many materials, most of them exhibit long periods or are limited to operate far below room temperature. Here, we present the real-space observation of an ordered helical spin order with a period of 3.2 nm in a non-chiral centrosymmetric helimagnet MnCoSi at room temperature via multi-angle and multi-azimuth approach of Lorentz transmission electron microscopy (TEM). A magnetic transition from the ordered helical spin order to a cycloidal spin order below 228 K is clearly revealed by in situ neutron powder diffraction and Lorentz TEM, which is closely correlated with temperature-induced variation in magneto-crystalline anisotropy. These results reveal the origin of spiral ordered spin textures in non-chiral centrosymmetric helimagnet, which can serve as a new strategy for searching materials with nanoscale spin order with potential applications in emergent electromagnetism.

preprint2022arXiv

On Private Online Convex Optimization: Optimal Algorithms in $\ell_p$-Geometry and High Dimensional Contextual Bandits

Differentially private (DP) stochastic convex optimization (SCO) is ubiquitous in trustworthy machine learning algorithm design. This paper studies the DP-SCO problem with streaming data sampled from a distribution and arrives sequentially. We also consider the continual release model where parameters related to private information are updated and released upon each new data, often known as the online algorithms. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different $\ell_p$ norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address such a challenge as the online convex optimization with privacy protection, we propose a private variant of online Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. Combined with the adaptive differential privacy analysis, our online algorithm achieves in linear time the optimal excess risk when $1<p\leq 2$ and the state-of-the-art excess risk meeting the non-private lower ones when $2<p\leq\infty$. Our algorithm can also be extended to the case $p=1$ to achieve nearly dimension-independent excess risk. While previous variance reduction results on recursive gradient have theoretical guarantee only in the independent and identically distributed sample setting, we establish such a guarantee in a non-stationary setting. To demonstrate the virtues of our method, we design the first DP algorithm for high-dimensional generalized linear bandits with logarithmic regret. Comparative experiments with a variety of DP-SCO and DP-Bandit algorithms exhibit the efficacy and utility of the proposed algorithms.

preprint2022arXiv

Prompt Tuning for Discriminative Pre-trained Language Models

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. However, to the best of our knowledge, existing works focus on prompt-tuning generative PLMs that are pre-trained to generate target tokens, such as BERT. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. In this work, we present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem. Comprehensive experiments on text classification and question answering show that, compared with vanilla fine-tuning, DPT achieves significantly higher performance, and also prevents the unstable problem in tuning large PLMs in both full-set and low-resource settings. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/DPT.

preprint2022arXiv

Structure-Aware Flow Generation for Human Body Reshaping

Body reshaping is an important procedure in portrait photo retouching. Due to the complicated structure and multifarious appearance of human bodies, existing methods either fall back on the 3D domain via body morphable model or resort to keypoint-based image deformation, leading to inefficiency and unsatisfied visual quality. In this paper, we address these limitations by formulating an end-to-end flow generation architecture under the guidance of body structural priors, including skeletons and Part Affinity Fields, and achieve unprecedentedly controllable performance under arbitrary poses and garments. A compositional attention mechanism is introduced for capturing both visual perceptual correlations and structural associations of the human body to reinforce the manipulation consistency among related parts. For a comprehensive evaluation, we construct the first large-scale body reshaping dataset, namely BR-5K, which contains 5,000 portrait photos as well as professionally retouched targets. Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art methods in terms of visual performance, controllability, and efficiency. The dataset is available at our website: https://github.com/JianqiangRen/FlowBasedBodyReshaping.

preprint2022arXiv

Tracking the nematicity in cuprate superconductors: a resistivity study under uniaxial pressure

Overshadowing the superconducting dome in hole-doped cuprates, the pseudogap state is still one of the mysteries that no consensus can be achieved. It has been suggested that the rotational symmetry is broken in this state and may result in a nematic phase transition, whose temperature seems to coincide with the onset temperature of the pseudogap state $T^*$ around optimal doping level, raising the question whether the pseudogap results from the establishment of the nematic order. Here we report results of resistivity measurements under uniaxial pressure on several hole-doped cuprates, where the normalized slope of the elastoresistivity $ζ$ can be obtained as illustrated in iron-based superconductors. The temperature dependence of $ζ$ along particular lattice axis exhibits kink feature at $T_{k}$ and shows Curie-Weiss-like behavior above it, which may suggest a spontaneous nematic transition. While $T_{k}$ seems to be the same as $T^*$ around the optimal doping and in the overdoped region, they become very different in underdoped La$_{2-x}$Sr$_{x}$CuO$_4$. Our results suggest that the nematic order, if indeed existing, is an electronic phase within the pseudogap state.

preprint2022arXiv

Unsupervised Domain Adaptation through Shape Modeling for Medical Image Segmentation

Shape information is a strong and valuable prior in segmenting organs in medical images. However, most current deep learning based segmentation algorithms have not taken shape information into consideration, which can lead to bias towards texture. We aim at modeling shape explicitly and using it to help medical image segmentation. Previous methods proposed Variational Autoencoder (VAE) based models to learn the distribution of shape for a particular organ and used it to automatically evaluate the quality of a segmentation prediction by fitting it into the learned shape distribution. Based on which we aim at incorporating VAE into current segmentation pipelines. Specifically, we propose a new unsupervised domain adaptation pipeline based on a pseudo loss and a VAE reconstruction loss under a teacher-student learning paradigm. Both losses are optimized simultaneously and, in return, boost the segmentation task performance. Extensive experiments on three public Pancreas segmentation datasets as well as two in-house Pancreas segmentation datasets show consistent improvements with at least 2.8 points gain in the Dice score, demonstrating the effectiveness of our method in challenging unsupervised domain adaptation scenarios for medical image segmentation. We hope this work will advance shape analysis and geometric learning in medical imaging.

preprint2021arXiv

Evaluating Visual Properties via Robust HodgeRank

Nowadays, how to effectively evaluate visual properties has become a popular topic for fine-grained visual comprehension. In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms. The main challenges of our task are two-fold. On one hand, the annotations often contain contaminated information, where a small fraction of label flips might ruin the global ranking of the whole dataset. On the other hand, considering the large data capacity, the annotations are often far from being complete. What is worse, there might even exist imbalanced annotations where a small subset of samples are frequently annotated. Facing such challenges, we propose a robust ranking framework based on the principle of Hodge decomposition of imbalanced and incomplete ranking data. According to the HodgeRank theory, we find that the major source of the contamination comes from the cyclic ranking component of the Hodge decomposition. This leads us to an outlier detection formulation as sparse approximations of the cyclic ranking projection. Taking a step further, it facilitates a novel outlier detection model as Huber&#39;s LASSO in robust statistics. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator. Statistical consistency of outlier detection is established in both cases under nearly the same conditions. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision.

preprint2021arXiv

Fast differentiable evolution of quantum states under Gaussian transformations

In a recent work we presented a recursive algorithm to compute the matrix elements of a generic Gaussian transformation in the photon-number basis. Its purpose was to evolve a quantum state by building the transformation matrix and subsequently computing the matrix-vector product. Here we present a faster algorithm that computes the final state without having to generate the full transformation matrix first. With this algorithm we bring the time complexity of computing the Gaussian evolution of an $N$-dimensional $M$-mode state from $O(MN^{2M})$ to $O(M(N^2/2)^M)$, which is an exponential improvement in the number of modes. In the special case of high squeezing, the evolved state can be approximated with complexity $O(MN^{M})$. Our new algorithm is differentiable, which means we can use it in conjunction with gradient-based optimizers for circuit optimization tasks. We benchmark our algorithm by optimizing circuits to produce single photons, Gottesman-Kitaev-Preskill states and NOON states, showing that it is up to one order of magnitude faster than the state of the art.

preprint2021arXiv

Natural Gradient Optimization for Optical Quantum Circuits

Optical quantum circuits can be optimized using gradient descent methods, as the gates in a circuit can be parametrized by continuous parameters. However, the parameter space as seen by the cost function is not Euclidean, which means that the Euclidean gradient does not generally point in the direction of steepest ascent. In order to retrieve the steepest ascent direction, in this work we implement Natural Gradient descent in the optical quantum circuit setting, which takes the local metric tensor into account. In particular, we adapt the Natural Gradient approach to a complex-valued parameter space. We then compare the Natural Gradient approach to vanilla gradient descent and to Adam over two state preparation tasks: a single-photon source and a Gottesman-Kitaev-Preskill state source. We observe that the NG approach has a faster convergence (due in part to the possibility of using larger learning rates) and a significantly smoother decay of the cost function throughout the optimization.

preprint2021arXiv

On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization

The low-rank stochastic semidefinite optimization has attracted rising attention due to its wide range of applications. The nonconvex reformulation based on the low-rank factorization, significantly improves the computational efficiency but brings some new challenge to the analysis. The stochastic variance reduced gradient (SVRG) method has been regarded as one of the most effective methods. SVRG in general consists of two loops, where a reference full gradient is first evaluated in the outer loop and then used to yield a variance reduced estimate of the current gradient in the inner loop. Two options have been suggested to yield the output of the inner loop, where Option I sets the output as its last iterate, and Option II yields the output via random sampling from all the iterates in the inner loop. However, there is a significant gap between the theory and practice of SVRG when adapted to the stochastic semidefinite programming (SDP). SVRG practically works better with Option I, while most of existing theoretical results focus on Option II. In this paper, we fill this gap via exploiting a new semi-stochastic variant of the original SVRG with Option I adapted to the semidefinite optimization. Equipped with this, we establish the global linear submanifold convergence (i.e., converging exponentially fast to a submanifold of a global minimum under the orthogonal group action) of the proposed SVRG method, given a provable initialization scheme and under certain smoothness and restricted strongly convex assumptions. Our analysis includes the effects of the mini-batch size and update frequency in the inner loop as well as two practical step size strategies, the fixed and stabilized Barzilai-Borwein step sizes. Some numerical results in matrix sensing demonstrate the efficiency of proposed SVRG method outperforming Option II counterpart as well as others.

preprint2021arXiv

Particle-hole symmetry breaking in a spin-dimer system TlCuCl$_3$ observed at 100 T

The entire magnetization process of TlCuCl$_3$ has been experimentally investigated up to 100 T employing the single-turn technique. The upper critical field $H_{c2}$ is observed to be 86.1 T at 2 K. A convex slope of the $M$-$H$ curve between the lower and upper critical fields ($H_{c1}$ and $H_{c2}$) is clearly observed, which indicates that a particle-hole symmetry is broken in TlCuCl$_3$. By quantum Monte Carlo simulation and the bond-operator theory method, we find that the particle-hole symmetry breaking results from strong inter-dimer interactions.

preprint2021arXiv

Polyimide-Based Flexible Coupled-Coils Design and Load-Shift Keying Analysis

Wireless power transfer using inductive coupling is commonly used for medical implantable devices. The design of the secondary coil on the implantable device is important as it will affect the power transfer efficiency, the size of the implant, and also the data transmission between the implant and the in-vitro controller. In this paper, we present a design of the secondary coil on a polyimide-based flexible substrate to achieve high power transfer efficiency. Load shift keying modulation is used for the data communication between the primary and secondary coils. A thorough analysis is done for the ideal and practical scenario and it shows that a mismatched secondary LC tank will affect the communication range and communication correctness. A solution to achieve robust data transmission is proposed and then verified by SPICE simulations.

preprint2021arXiv

Rethinking Breiman&#39;s Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

Margin enlargement over training data has been an important strategy since perceptrons in machine learning for the purpose of boosting the robustness of classifiers toward a good generalization ability. Yet Breiman (1999) showed a dilemma that a uniform improvement on margin distribution does NOT necessarily reduces generalization errors. In this paper, we revisit Breiman&#39;s dilemma in deep neural networks with recently proposed spectrally normalized margins, from a novel perspective based on phase transitions of normalized margin distributions in training dynamics. Normalized margin distribution of a classifier over the data, can be divided into two parts: low/small margins such as some negative margins for misclassified samples vs. high/large margins for high confident correctly classified samples, that often behave differently during the training process. Low margins for training and test datasets are often effectively reduced in training, along with reductions of training and test errors; while high margins may exhibit different dynamics, reflecting the trade-off between expressive power of models and complexity of data. When data complexity is comparable to the model expressiveness, high margin distributions for both training and test data undergo similar decrease-increase phase transitions during training. In such cases, one can predict the trend of generalization or test error by margin-based generalization bounds with restricted Rademacher complexities, shown in two ways in this paper with early stopping time exploiting such phase transitions. On the other hand, over-expressive models may have both low and high training margins undergoing uniform improvements, with a distinct phase transition in test margin dynamics. This reconfirms the Breiman&#39;s dilemma associated with overparameterized neural networks where margins fail to predict overfitting.

preprint2021arXiv

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.

preprint2021arXiv

UPRec: User-Aware Pre-training for Recommender Systems

Existing sequential recommendation methods rely on large amounts of training data and usually suffer from the data sparsity problem. To tackle this, the pre-training mechanism has been widely adopted, which attempts to leverage large-scale data to perform self-supervised learning and transfer the pre-trained parameters to downstream tasks. However, previous pre-trained models for recommendation focus on leverage universal sequence patterns from user behaviour sequences and item information, whereas ignore capturing personalized interests with the heterogeneous user information, which has been shown effective in contributing to personalized recommendation. In this paper, we propose a method to enhance pre-trained models with heterogeneous user information, called User-aware Pre-training for Recommendation (UPRec). Specifically, UPRec leverages the user attributes andstructured social graphs to construct self-supervised objectives in the pre-training stage and proposes two user-aware pre-training tasks. Comprehensive experimental results on several real-world large-scale recommendation datasets demonstrate that UPRec can effectively integrate user information into pre-trained models and thus provide more appropriate recommendations for users.

preprint2020arXiv

$\textit{Ab Initio}$ Mismatched Interface Theory of Graphene on $α$-RuCl$_3$: Doping and Magnetism

Recent developments in twisted and lattice-mismatched bilayers have revealed a rich phase space of van der Waals systems and generated excitement. Among these systems are heterobilayers which can offer new opportunities to control van der Waals systems with strong in plane correlations such as spin-orbit-assisted Mott insulator $α$-RuCl$_3$. Nevertheless, a theoretical $\textit{ab initio}$ framework for mismatched heterobilayers without even approximate periodicity is sorely lacking. We propose a general strategy for calculating electronic properties of such systems, mismatched interface theory (MINT), and apply it to the graphene/$α$-RuCl$_{3}$ (GR/$α$-RuCl$_{3}$) heterostructure. Using MINT, we predict uniform doping of 4.77$\%$ from graphene to $α$-RuCl$_3$ and magnetic interactions in $α$-RuCl$_3$ to shift the system toward the Kitaev point. Hence we demonstrate that MINT can guide targeted materialization of desired model systems and discuss recent experiments on GR/$α$-RuCl$_{3}$ heterostructures.

preprint2020arXiv

$α$ Decay Half-life Estimation and Uncertainty Analysis

The non-parametric bootstrap method is used to evaluate the uncertainties of two $α$ decay formulas, the universal decay law (UDL) and the new Geiger-Nuttall law (NGNL). Such a method can simultaneously obtain the uncertainty of each parameter, the correlation between each pair of parameters, and the total, statistical, and systematic uncertainties of each formula. Both even-even (ee) nuclei and odd-A (oA) nuclei are used in the analysis. The collected data are separated into three parts: ee nuclei, oA nuclei without spin or parity change (oA\_nc), and oA nuclei with spin and/or parity change (oA\_c). Based on the residues between observed data and corresponding calculations, the statistical and systematic uncertainties are decomposed from the total uncertainty, from which one can clarify the effects from the shell structure, pairing, and angular momentum change on describing $α$ decay half-life. If $N > 126$ and $N \leqslant 126$ nuclei are considered together, the systematic uncertainty of residues between observed and predicted half-lives are larger than if those groups are considered separately. Without shell correction term, a much larger systematic uncertainty is found if parameters obtained for $N \leqslant 126$ nuclei are used to describe the half-lives of $N > 126$ nuclei. A global hindrance on the $α$ decay process is found in oA\_nc (oA\_c) nuclei comparing with ee (oA\_nc) nuclei. If parameters obtained from ee (oA\_nc) nuclei are used, the half-lives of oA\_nc (oA\_c) nuclei are generally underestimated with large systematic uncertainties, which can be related to the contribution of pairing effect and angular momentum. The recently observed superallowed decay from $^{104}$Te to $^{100}$Sn is also discussed based on uncertainty analysis. (Abstract is not fully presented because of length limitation)

preprint2020arXiv

A generalized boundary condition applied to Lieb-Schultz-Mattis type ingappabilities and many-body Chern numbers

We introduce a new boundary condition which renders the flux-insertion argument for the Lieb-Schultz-Mattis type theorems in two or higher dimensions free from the specific choice of system sizes. It also enables a formulation of the Lieb-Schultz-Mattis type theorems in arbitrary dimensions in terms of the anomaly in field theories of $1+1$ dimensions with a bulk correspondence as a BF-theory in 2+1 dimensions. Furthermore, we apply the anomaly-based formulation to the constraints on a half-filled spinless fermion on a square lattice with $π$ flux, utilizing time-reversal, the magnetic translation and on-site internal $U(N)$ symmetries. This demonstrates the role of time-reversal anomaly on the ingappabilities of a lattice model.

preprint2020arXiv

Accurate many-body electronic structure near the basis set limit: application to the chromium dimer

We describe a method for computing near-exact energies for correlated systems with large Hilbert spaces. The method efficiently identifies the most important basis states (Slater determinants) and performs a variational calculation in the subspace spanned by these determinants. A semistochastic approach is then used to add a perturbative correction to the variational energy to compute the total energy. The size of the variational space is progressively increased until the total energy converges to within the desired tolerance. We demonstrate the power of the method by computing a near-exact potential energy curve (PEC) for a very challenging molecule -- the chromium dimer.

preprint2020arXiv

Boosting Semantic Human Matting with Coarse Annotations

Semantic human matting aims to estimate the per-pixel opacity of the foreground human regions. It is quite challenging and usually requires user interactive trimaps and plenty of high quality annotated data. Annotating such kind of data is labor intensive and requires great skills beyond normal users, especially considering the very detailed hair part of humans. In contrast, coarse annotated human dataset is much easier to acquire and collect from the public dataset. In this paper, we propose to use coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input. Specifically, we train a mask prediction network to estimate the coarse semantic mask using the hybrid data, and then propose a quality unification network to unify the quality of the previous coarse mask outputs. A matting refinement network takes in the unified mask and the input image to predict the final alpha matte. The collected coarse annotated dataset enriches our dataset significantly, allows generating high quality alpha matte for real images. Experimental results show that the proposed method performs comparably against state-of-the-art methods. Moreover, the proposed method can be used for refining coarse annotated public dataset, as well as semantic segmentation methods, which reduces the cost of annotating high quality human data to a great extent.

preprint2020arXiv

Chemistry of the spin-1/2 kagome Heisenberg antiferromagnet

We believe that a necessary first step in understanding the ground state properties of the spin-${\scriptstyle\frac{1}{2}}$ kagome Heisenberg antiferromagnet is a better understanding of this model&#39;s very large number of low energy singlet states. A description of the low energy states that is both accurate and amenable for numerical work may ultimately prove to have greater value than knowing only what these properties are, in particular when these turn on the delicate balance of many small energies. We demonstrate how this program would be implemented using the basis of spin-singlet dimerized states, though other bases that have been proposed may serve the same purpose. The quality of a basis is evaluated by its participation in all the low energy singlets, not just the ground state. From an experimental perspective, and again in light of the small energy scales involved, methods that can deliver all the low energy states promise more robust predictions than methods that only refine a fraction of these states.

preprint2020arXiv

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, DessiLBI unveils &#34;winning tickets&#34; in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.

preprint2020arXiv

Efficient Estimation For The Cox Proportional Hazards Cure Model

While analysing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using cure model which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package (Cai et al., 2012) and we compare the results with the output obtained from SMCURE package.

preprint2020arXiv

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces. Recent methods address this challenge through the use of largely unstructured neural networks that effectively distill conditional mapping and priors over 3D shape. In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes. Following these observations, we first predict orthographic 2.5D visible surface maps (depth, normal and silhouette) from perspective 2D images, and detect global reflective symmetries in this data; second, we predict the back facing depth and normal maps using as input the front maps and, when available, the symmetric reflections of these maps; and finally, we reconstruct a 3D mesh from the union of these maps using a surface reconstruction method best suited for this data. Our experiments demonstrate that our framework outperforms state-of-the art approaches for 3D shape reconstructions from 2D and 2.5D data in terms of input fidelity and details preservation. Specifically, we achieve 12% better performance on average in ShapeNet benchmark dataset, and up to 19% for certain classes of objects (e.g., chairs and vessels).

preprint2020arXiv

Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect

Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer between domains and tasks, which can potentially alleviate the data sparsity problem in recommender systems. In this survey, we first provide a review of recommender systems with pre-training. In addition, we show the benefits of pre-training to recommender systems through experiments. Finally, we discuss several promising directions for future research for recommender systems with pre-training.

preprint2020arXiv

Large anomalous Hall effect in a hexagonal ferromagnetic Fe5Sn3 single crystal

In this paper, we report an experimental observation of the large anomalous Hall effect (AHE) in a hexagonal ferromagnetic Fe5Sn3 single crystal with current along the b axis and a magnetic field normal to the bc plane. The intrinsic contribution of the anomalous Hall conductance sigma_AH^int was approximately 613 Ω-1 cm-1, which was more than 3 times the maximum value in the frustrated kagome magnet Fe3Sn2 and nearly independent of the temperature over a wide range between 5 and 350 K. The analysis results revealed that the large AHE was dominated by a common, intrinsic term, while the extrinsic contribution, i.e., the skew scattering and side jump, turned out to be small. In addition to the large AHE, it was found the types of majority carriers changed at approximately 275 and 30 K, consistent with the critical temperatures of the spin reorientation. These findings suggest that the hexagonal ferromagnetic Fe5Sn3 single crystal is an excellent candidate to use for the study of the topological features in ferromagnets.

preprint2020arXiv

Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of finding the needle in a haystack

The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}(d^{2.5})$, which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}(d^{4})$. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to $\mathcal{O}(d^{4.5})$.

preprint2020arXiv

Leveraging both Lesion Features and Procedural Bias in Neuroimaging: An Dual-Task Split dynamics of inverse scale space

The prediction and selection of lesion features are two important tasks in voxel-based neuroimage analysis. Existing multivariate learning models take two tasks equivalently and optimize simultaneously. However, in addition to lesion features, we observe that there is another type of feature, which is commonly introduced during the procedure of preprocessing steps, which can improve the prediction result. We call such a type of feature as procedural bias. Therefore, in this paper, we propose that the features/voxels in neuroimage data are consist of three orthogonal parts: lesion features, procedural bias, and null features. To stably select lesion features and leverage procedural bias into prediction, we propose an iterative algorithm (termed GSplit LBI) as a discretization of differential inclusion of inverse scale space, which is the combination of Variable Splitting scheme and Linearized Bregman Iteration (LBI). Specifically, with a variable the splitting term, two estimators are introduced and split apart, i.e. one is for feature selection (the sparse estimator) and the other is for prediction (the dense estimator). Implemented with Linearized Bregman Iteration (LBI), the solution path of both estimators can be returned with different sparsity levels on the sparse estimator for the selection of lesion features. Besides, the dense the estimator can additionally leverage procedural bias to further improve prediction results. To test the efficacy of our method, we conduct experiments on the simulated study and Alzheimer&#39;s Disease Neuroimaging Initiative (ADNI) database. The validity and the benefit of our model can be shown by the improvement of prediction results and the interpretability of visualized procedural bias and lesion features.

preprint2020arXiv

Nonlinear parameter-gauge coupling approach to generalization of generalized Thouless pumps and $-1$-form anomaly

We study the nontrivial topology of the parameter space of general $U(1)$-symmetric fermionic non-degenerately gapped system and its consequences on the transport properties in arbitrary dimensions. By a nonlinear parameter-gauge topological response theory, we find that such nontrivial topology can impose quantization constraints on the charge transport in the presence of background fluxes or, more generally, instantons in general dimensions and our result generalizes the Thouless pump and its higher dimensional generalizations. We also show that these nontrivial transport properties are related to an unconventional quantum anomaly, which generalizes $-1$-form anomalies. This anomaly imposes non-perturbative ingappabilities of various types of spatial interfaces or time-dependent system evolution.

preprint2020arXiv

Self-controlled growth of highly uniform Ge/Si hut wires for scalable qubit devices

Semiconductor nanowires have been playing a crucial role in the development of nanoscale devices for the realization of spin qubits, Majorana fermions, single photon emitters, nanoprocessors, etc. The monolithic growth of site-controlled nanowires is a prerequisite towards the next generation of devices that will require addressability and scalability. Here, combining top-down nanofabrication and bottom-up self-assembly, we report on the growth of Ge wires on pre-patterned Si (001) substrates with controllable position, distance, length and structure. This is achieved by a novel growth process which uses a SiGe strain-relaxation template and can be generalized to other material combinations. Transport measurements show an electrically tunable spin-orbit coupling, with a spin-orbit length similar to that of III-V materials. Also, capacitive coupling between closely spaced wires is observed, which underlines their potential as a host for implementing two qubit gates. The reported results open a path towards scalable qubit devices with Si compatibility.

preprint2020arXiv

Two-photon interference: the Hong-Ou-Mandel effect

Nearly 30 years ago, two-photon interference was observed, marking the beginning of a new quantum era. Indeed, two-photon interference has no classical analogue, giving it a distinct advantage for a range of applications. The peculiarities of quantum physics may now be used to our advantage to outperform classical computations, securely communicate information, simulate highly complex physical systems and increase the sensitivity of precise measurements. This separation from classical to quantum physics has motivated physicists to study two-particle interference for both fermionic and bosonic quantum objects. So far, two-particle interference has been observed with massive particles, among others, such as electrons and atoms, in addition to plasmons, demonstrating the extent of this effect to larger and more complex quantum systems. A wide array of novel applications to this quantum effect is to be expected in the future. This review will thus cover the progress and applications of two-photon (two-particle) interference over the last three decades.

preprint2020arXiv

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP

preprint2019arXiv

Direct comparison of many-body methods for realistic electronic Hamiltonians

A large collaboration carefully benchmarks 20 first principles many-body electronic structure methods on a test set of 7 transition metal atoms, and their ions and monoxides. Good agreement is attained between the 3 systematically converged methods, resulting in experiment-free reference values. These reference values are used to assess the accuracy of modern emerging and scalable approaches to the many-electron problem. The most accurate methods obtain energies indistinguishable from experimental results, with the agreement mainly limited by the experimental uncertainties. Comparison between methods enables a unique perspective on calculations of many-body systems of electrons.

preprint2019arXiv

Observation of Magnetic Skyrmion Bubbles in a van der Waals ferromagnet Fe3GeTe2

Two-dimensional (2D) van der Waals (vdW) magnetic materials have recently been introduced as a new horizon in materials science and enable the potential applications for next-generation spintronic devices. Here, in this communication, the observations of stable Bloch-type magnetic skyrmions in single crystals of 2D vdW Fe3GeTe2 (FGT) are reported by using in-situ Lorentz transmission electron microscopy (TEM). We find the ground-state magnetic stripe domains in FGT transform into skyrmion bubbles when an external magnetic field is applied perpendicularly to the (001) thin plate with temperatures below the Curie-temperature TC. Most interestingly, a hexagonal lattice of skyrmion bubbles is obtained via field cooling manipulation with magnetic field applied along the [001] direction. Owing to their topological stability, the skyrmion bubble lattices are stable to large field-cooling tilted angles and further reproduced by utilizing the micromagnetic simulations. These observations directly demonstrate that the 2D vdW FGT possesses a rich variety of topological spin textures, being of a great promise candidate for future applications in the field of spintronics.

preprint2018arXiv

MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning

It is one typical and general topic of learning a good embedding model to efficiently learn the representation coefficients between two spaces/subspaces. To solve this task, $L_{1}$ regularization is widely used for the pursuit of feature selection and avoiding overfitting, and yet the sparse estimation of features in $L_{1}$ regularization may cause the underfitting of training data. $L_{2}$ regularization is also frequently used, but it is a biased estimator. In this paper, we propose the idea that the features consist of three orthogonal parts, \emph{namely} sparse strong signals, dense weak signals and random noise, in which both strong and weak signals contribute to the fitting of data. To facilitate such novel decomposition, \emph{MSplit} LBI is for the first time proposed to realize feature selection and dense estimation simultaneously. We provide theoretical and simulational verification that our method exceeds $L_{1}$ and $L_{2}$ regularization, and extensive experimental results show that our method achieves state-of-the-art performance in the few-shot and zero-shot learning.