Source author record

Yuan Yao

Yuan Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

81works

51topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

Visual encoding constitutes a major computational bottleneck in Multimodal Large Language Models (MLLMs), especially for high-resolution image inputs. The prevailing practice typically adopts global encoding followed by post-ViT compression. Global encoding produces massive token sequences, while post-ViT compression incurs the full quadratic attention cost of the ViT before any token reduction takes place. In this work, we revisit this convention along two dimensions: the encoding strategy and visual token compression. First, controlled experiments show that slice-based encoding outperforms global encoding across benchmarks, suggesting that preserving local details through sliced views can be more beneficial than applying global attention for fine-grained perception. Second, we introduce intra-ViT early compression, which reduces tokens in shallow ViT layers and substantially lowers visual-encoding FLOPs while preserving downstream performance. By integrating intra-ViT compression into the slice-based encoding framework, we present LLaVA-UHD v4, an efficient and compute-controllable visual encoding scheme tailored for high-resolution inputs. Across a diverse set of benchmarks covering document understanding, OCR, and general VQA, LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% while matching or even surpassing baseline performance. These results suggest that visual-encoding efficiency can be substantially improved without sacrificing downstream performance, providing a practical design direction for efficient high-resolution MLLMs. All model weights and code will be publicly released to support further research.

preprint2026arXiv

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

preprint2026arXiv

Uncertainty Quantification for LLM-based Code Generation

Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.

preprint2024arXiv

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs. However, Large Multimodal Models (LMMs) encounter two issues in such scenarios: (1) a lack of fine-grained perception, and (2) a tendency to blend information across multiple images. We first extensively investigate the capability of LMMs to perceive fine-grained visual details when dealing with multiple input images. The research focuses on two aspects: first, image-to-image matching (to evaluate whether LMMs can effectively reason and pair relevant images), and second, multi-image-to-text matching (to assess whether LMMs can accurately capture and summarize detailed image information). We conduct evaluations on a range of both open-source and closed-source large models, including GPT-4V, Gemini, OpenFlamingo, and MMICL. To enhance model performance, we further develop a Contrastive Chain-of-Thought (CoCoT) prompting approach based on multi-input multimodal models. This method requires LMMs to compare the similarities and differences among multiple image inputs, and then guide the models to answer detailed questions about multi-image inputs based on the identified similarities and differences. Our experimental results showcase CoCoT's proficiency in enhancing the multi-image comprehension capabilities of large multimodal models.

preprint2024arXiv

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

We present En3D, an enhanced generative scheme for sculpting high-quality 3D human avatars. Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalanced viewing angles and imprecise pose priors, our approach aims to develop a zero-shot 3D generative scheme capable of producing visually realistic, geometrically accurate and content-wise diverse 3D humans without relying on pre-existing 3D or 2D assets. To address this challenge, we introduce a meticulously crafted workflow that implements accurate physical modeling to learn the enhanced 3D generative model from synthetic 2D data. During inference, we integrate optimization modules to bridge the gap between realistic appearances and coarse 3D shapes. Specifically, En3D comprises three modules: a 3D generator that accurately models generalizable 3D humans with realistic appearance from synthesized balanced, diverse, and structured human images; a geometry sculptor that enhances shape quality using multi-view normal constraints for intricate human anatomy; and a texturing module that disentangles explicit texture maps with fidelity and editability, leveraging semantical UV partitioning and a differentiable rasterizer. Experimental results show that our approach significantly outperforms prior works in terms of image quality, geometry accuracy and content diversity. We also showcase the applicability of our generated avatars for animation and editing, as well as the scalability of our approach for content-style free adaptation.

preprint2023arXiv

Duality viewpoint of criticality

In this work, we study quantum many-body systems which are self-dual under duality transformation connecting different symmetry protected topological (SPT) phases. We provide a geometric explanation of the criticality of these self-dual models. More precisely, we show a ground state (quasi-)degeneracy under the periodic boundary conditions,i.e., the ingappability of the bulk spectrum. Equivalently, the symmetry group at criticality, including the duality symmetry, has a mixed 't Hooft anomaly. This approach can not only predict the spectrum of the self-dual model with ordinary 0-form symmetry, but also be applied to that with generalized symmetry, such as higher form and subsystem symmetry. As an application, we illustrate our results with several examples in one and two dimensions, which separate two different SPTs.

preprint2023arXiv

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

A special cross-tie domain wall in helimagnet

A special cross-tie (SCT) domain wall was discovered in the helimagnet MnCoSi alloy via the magnetic vector field tomography in Lorentz transmission electron microscopy (LTEM). Different to the traditional cross-tie (TCT) domain wall where the convergent/divergent magnetic moment configuration line up one by one, the relative large Bloch type sub-walls emerge in this brand-new SCT domain wall and two mutually perpendicular rotation axes coexist in this special feature. The straight magnetic stripes accompanied with the unraveled domain walls hint the complex mechanism to form this SCT structure. Interestingly, different orientation of this domain wall in LTEM can easily exhibit various magnetic features, including meron/antimeron chains or bimeron strings.

preprint2022arXiv

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

Holographic Multiple-Input and Multiple-Output (MIMO) is envisioned as a promising technology to realize unprecedented spectral efficiency by integrating a large number of antennas into a compact space. Most research on holographic MIMO is based on isotropic scattering environments, and the antenna gain is assumed to be unlimited by deployment space. However, the channel might not satisfy isotropic scattering because of generalized angle distributions, and the antenna gain is limited by the array aperture in reality. In this letter, we aim to analyze the holographic MIMO channel capacity under practical angle distribution and array aperture constraints. First, we calculate the spectral density for generalized angle distributions by introducing a wavenumber domain-based method. And then, the capacity under generalized angle distributions is analyzed and two different aperture schemes are considered. Finally, numerical results show that the capacity is obviously affected by angle distribution at high signal-to-noise ratio (SNR) but hardly affected at low SNR, and the capacity will not increase infinitely with antenna density due to the array aperture constraint.

preprint2022arXiv

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor by checking the ratio of prediction changes after applying the learned patch on the low-confidence data. Extensive evaluations on five backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

preprint2022arXiv

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural language in image data, facilitating a broad variety of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VL-PTMs for downstream tasks. To address the challenge, we present Cross-modal Prompt Tuning (CPT, alternatively, Colorful Prompt Tuning), a novel paradigm for tuning VL-PTMs, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT enables strong few-shot and even zero-shot visual grounding capabilities of VL-PTMs. Comprehensive experimental results show that the prompt-tuned VL-PTMs outperform their fine-tuned counterparts by a large margin (e.g., 17.3% absolute accuracy improvement, and 73.8% relative standard deviation reduction on average with one shot in RefCOCO evaluation). We make the data and code for this paper publicly available at https://github.com/thunlp/CPT.

preprint2022arXiv

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations.

preprint2022arXiv

Detecting Topology Attacks against Graph Neural Networks

Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.

preprint2022arXiv

Equiangular lines with a fixed angle

Solving a longstanding problem on equiangular lines, we determine, for each given fixed angle and in all sufficiently large dimensions, the maximum number of lines pairwise separated by the given angle. Fix $0 < α< 1$. Let $N_α(d)$ denote the maximum number of lines through the origin in $\mathbb{R}^d$ with pairwise common angle $\arccos α$. Let $k$ denote the minimum number (if it exists) of vertices in a graph whose adjacency matrix has spectral radius exactly $(1-α)/(2α)$. If $k < \infty$, then $N_α(d) = \lfloor k(d-1)/(k-1) \rfloor$ for all sufficiently large $d$, and otherwise $N_α(d) = d + o(d)$. In particular, $N_{1/(2k-1)}(d) = \lfloor k(d-1)/(k-1) \rfloor$ for every integer $k\ge 2$ and all sufficiently large $d$. A key ingredient is a new result in spectral graph theory: the adjacency matrix of a connected bounded degree graph has sublinear second eigenvalue multiplicity.

preprint2022arXiv

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces

The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.

preprint2022arXiv

Fine-Grained Scene Graph Generation with Data Transfer

Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.

preprint2022arXiv

From Cascades to $J$-holomorphic Curves and Back

This paper develops the analysis needed to set up a Morse-Bott version of embedded contact homology (ECH) of a contact three-manifold in certain cases. In particular we establish a correspondence between "cascades" of holomorphic curves in the symplectization of a Morse-Bott contact form, and holomorphic curves in the symplectization of a nondegenerate perturbation of the contact form. The cascades we consider must be transversely cut out and rigid. We accomplish this by studying the adiabatic degeneration of $J$-holomorphic curves into cascades and establishing a gluing theorem. We note our gluing theorem satisfying appropriate transversality hypotheses should work in higher dimensions as well. The details of ECH applications will appear elsewhere.

preprint2022arXiv

Gappability Index for Quantum Many-Body Systems

We propose an index $\mathcal{I}_G$ which characterizes the degree of ingappability, namely the difficulty to induce a unique ground state with a nonvanishing excitation gap, in the presence of a symmetry $G$. $\mathcal{I}_G$ represents the dimension of the subspace of ambient uniquely-gapped in the entire $G$-invariant "theory space". The celebrated Lieb-Schultz-Mattis theorem corresponds, in our formulation, to the case $\mathcal{I}_G=0$ (completely ingappable) for the symmetry $G$ including the lattice translation symmetry. We illustrate the usefulness of the index by discussing the phase diagram of spin-$1/2$ antiferromagnets in various dimensions, which do not necessarily have the translation symmetry.

preprint2022arXiv

Gate-Level Side-Channel Leakage Assessment with Architecture Correlation Analysis

While side-channel leakage is traditionally evaluated from a fabricated chip, it is more time-efficient and cost-effective to do so during the design phase of the chip. We present a methodology to rank the gates of a design according to their contribution to the side-channel leakage of the chip. The methodology relies on logic synthesis, logic simulation, gate-level power estimation, and gate leakage assessment to compute a ranking. The ranking metric can be defined as a specific test by correlating gate-level activity with a leakage model, or else as a non-specific test by evaluating gate-level activity in response to distinct test vector groups. Our results show that only a minority of the gates in a design contribute most of the side-channel leakage. We demonstrate this property for several designs, including a hardware AES coprocessor and a cryptographic hardware/software interface in a five-stage pipelined RISC processor.

preprint2022arXiv

Generative Adversarial Networks for Robust Cryo-EM Image Denoising

The cryo-electron microscopy (Cryo-EM) becomes popular for macromolecular structure determination. However, the 2D images which Cryo-EM detects are of high noise and often mixed with multiple heterogeneous conformations or contamination, imposing a challenge for denoising. Traditional image denoising methods can not remove Cryo-EM image noise well when the signal-noise-ratio (SNR) of images is meager. Thus it is desired to develop new effective denoising techniques to facilitate further research such as 3D reconstruction, 2D conformation classification, and so on. In this paper, we approach the robust image denoising problem in Cryo-EM by a joint Autoencoder and Generative Adversarial Networks (GAN) method. Equipped with robust $\ell_1$ Autoencoder and some designs of robust $β$-GANs, one can stabilize the training of GANs and achieve the state-of-the-art performance of robust denoising with low SNR data and against possible information contamination. The method is evaluated by both a heterogeneous conformational dataset on the Thermus aquaticus RNA Polymerase (RNAP) and a homogenous dataset on the Plasmodium falciparum 80S ribosome dataset (EMPIRE-10028), in terms of Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), as well as heterogeneous conformation clustering. These results suggest that our proposed methodology provides an effective tool for Cryo-EM 2D image denoising. Our code is available in "https://github.com/ghl1995/denoise-gan-in-cryo-em".

preprint2022arXiv

Geometric approach to Lieb-Schultz-Mattis theorem without translation symmetry under inversion or rotation symmetry

We propose a geometric {approach to Lieb-Schultz-Mattis theorem for} quantum many-body systems with discrete spin-rotation symmetries and lattice inversion or rotation symmetry, but without translation symmetry assumed. Under symmetry-twisting on a $(d-1)$-dimensional plane, we find that any $d$-dimensional inversion-symmetric spin system possesses a doubly degenerate spectrum when it hosts a half-integer spin at the inversion-symmetric point. We also show that any rotation-symmetric generalized spin model with a projective representation at the rotation center has a similar degeneracy under symmetry-twisting. We argue that these degeneracies imply that {a unique symmetric gapped ground state that is smoothly connected to product states} is forbidden in the original untwisted systems -- generalized inversional/rotational Lieb-Schultz-Mattis theorems without lattice translation symmetry imposed. The traditional Lieb-Schultz-Mattis theorems with translations also fit in the proposed framework.

preprint2022arXiv

Observation of short-period helical spin order and magnetic transition in a non-chiral centrosymmetric helimagnet

The search for materials exhibiting nanoscale spiral order continues to be fuelled by the promise of emergent inductors. Although such spin textures have been reported in many materials, most of them exhibit long periods or are limited to operate far below room temperature. Here, we present the real-space observation of an ordered helical spin order with a period of 3.2 nm in a non-chiral centrosymmetric helimagnet MnCoSi at room temperature via multi-angle and multi-azimuth approach of Lorentz transmission electron microscopy (TEM). A magnetic transition from the ordered helical spin order to a cycloidal spin order below 228 K is clearly revealed by in situ neutron powder diffraction and Lorentz TEM, which is closely correlated with temperature-induced variation in magneto-crystalline anisotropy. These results reveal the origin of spiral ordered spin textures in non-chiral centrosymmetric helimagnet, which can serve as a new strategy for searching materials with nanoscale spin order with potential applications in emergent electromagnetism.

preprint2022arXiv

On Private Online Convex Optimization: Optimal Algorithms in $\ell_p$-Geometry and High Dimensional Contextual Bandits

Differentially private (DP) stochastic convex optimization (SCO) is ubiquitous in trustworthy machine learning algorithm design. This paper studies the DP-SCO problem with streaming data sampled from a distribution and arrives sequentially. We also consider the continual release model where parameters related to private information are updated and released upon each new data, often known as the online algorithms. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different $\ell_p$ norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address such a challenge as the online convex optimization with privacy protection, we propose a private variant of online Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. Combined with the adaptive differential privacy analysis, our online algorithm achieves in linear time the optimal excess risk when $1<p\leq 2$ and the state-of-the-art excess risk meeting the non-private lower ones when $2<p\leq\infty$. Our algorithm can also be extended to the case $p=1$ to achieve nearly dimension-independent excess risk. While previous variance reduction results on recursive gradient have theoretical guarantee only in the independent and identically distributed sample setting, we establish such a guarantee in a non-stationary setting. To demonstrate the virtues of our method, we design the first DP algorithm for high-dimensional generalized linear bandits with logarithmic regret. Comparative experiments with a variety of DP-SCO and DP-Bandit algorithms exhibit the efficacy and utility of the proposed algorithms.

preprint2022arXiv

Prompt Tuning for Discriminative Pre-trained Language Models

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. However, to the best of our knowledge, existing works focus on prompt-tuning generative PLMs that are pre-trained to generate target tokens, such as BERT. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. In this work, we present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem. Comprehensive experiments on text classification and question answering show that, compared with vanilla fine-tuning, DPT achieves significantly higher performance, and also prevents the unstable problem in tuning large PLMs in both full-set and low-resource settings. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/DPT.

preprint2022arXiv

Structure-Aware Flow Generation for Human Body Reshaping

Body reshaping is an important procedure in portrait photo retouching. Due to the complicated structure and multifarious appearance of human bodies, existing methods either fall back on the 3D domain via body morphable model or resort to keypoint-based image deformation, leading to inefficiency and unsatisfied visual quality. In this paper, we address these limitations by formulating an end-to-end flow generation architecture under the guidance of body structural priors, including skeletons and Part Affinity Fields, and achieve unprecedentedly controllable performance under arbitrary poses and garments. A compositional attention mechanism is introduced for capturing both visual perceptual correlations and structural associations of the human body to reinforce the manipulation consistency among related parts. For a comprehensive evaluation, we construct the first large-scale body reshaping dataset, namely BR-5K, which contains 5,000 portrait photos as well as professionally retouched targets. Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art methods in terms of visual performance, controllability, and efficiency. The dataset is available at our website: https://github.com/JianqiangRen/FlowBasedBodyReshaping.

preprint2022arXiv

Tracking the nematicity in cuprate superconductors: a resistivity study under uniaxial pressure

Overshadowing the superconducting dome in hole-doped cuprates, the pseudogap state is still one of the mysteries that no consensus can be achieved. It has been suggested that the rotational symmetry is broken in this state and may result in a nematic phase transition, whose temperature seems to coincide with the onset temperature of the pseudogap state $T^*$ around optimal doping level, raising the question whether the pseudogap results from the establishment of the nematic order. Here we report results of resistivity measurements under uniaxial pressure on several hole-doped cuprates, where the normalized slope of the elastoresistivity $ζ$ can be obtained as illustrated in iron-based superconductors. The temperature dependence of $ζ$ along particular lattice axis exhibits kink feature at $T_{k}$ and shows Curie-Weiss-like behavior above it, which may suggest a spontaneous nematic transition. While $T_{k}$ seems to be the same as $T^*$ around the optimal doping and in the overdoped region, they become very different in underdoped La$_{2-x}$Sr$_{x}$CuO$_4$. Our results suggest that the nematic order, if indeed existing, is an electronic phase within the pseudogap state.

preprint2022arXiv

Unsupervised Domain Adaptation through Shape Modeling for Medical Image Segmentation

Shape information is a strong and valuable prior in segmenting organs in medical images. However, most current deep learning based segmentation algorithms have not taken shape information into consideration, which can lead to bias towards texture. We aim at modeling shape explicitly and using it to help medical image segmentation. Previous methods proposed Variational Autoencoder (VAE) based models to learn the distribution of shape for a particular organ and used it to automatically evaluate the quality of a segmentation prediction by fitting it into the learned shape distribution. Based on which we aim at incorporating VAE into current segmentation pipelines. Specifically, we propose a new unsupervised domain adaptation pipeline based on a pseudo loss and a VAE reconstruction loss under a teacher-student learning paradigm. Both losses are optimized simultaneously and, in return, boost the segmentation task performance. Extensive experiments on three public Pancreas segmentation datasets as well as two in-house Pancreas segmentation datasets show consistent improvements with at least 2.8 points gain in the Dice score, demonstrating the effectiveness of our method in challenging unsupervised domain adaptation scenarios for medical image segmentation. We hope this work will advance shape analysis and geometric learning in medical imaging.

preprint2021arXiv

Evaluating Visual Properties via Robust HodgeRank

Nowadays, how to effectively evaluate visual properties has become a popular topic for fine-grained visual comprehension. In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms. The main challenges of our task are two-fold. On one hand, the annotations often contain contaminated information, where a small fraction of label flips might ruin the global ranking of the whole dataset. On the other hand, considering the large data capacity, the annotations are often far from being complete. What is worse, there might even exist imbalanced annotations where a small subset of samples are frequently annotated. Facing such challenges, we propose a robust ranking framework based on the principle of Hodge decomposition of imbalanced and incomplete ranking data. According to the HodgeRank theory, we find that the major source of the contamination comes from the cyclic ranking component of the Hodge decomposition. This leads us to an outlier detection formulation as sparse approximations of the cyclic ranking projection. Taking a step further, it facilitates a novel outlier detection model as Huber's LASSO in robust statistics. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator. Statistical consistency of outlier detection is established in both cases under nearly the same conditions. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision.

preprint2021arXiv

Fast differentiable evolution of quantum states under Gaussian transformations

In a recent work we presented a recursive algorithm to compute the matrix elements of a generic Gaussian transformation in the photon-number basis. Its purpose was to evolve a quantum state by building the transformation matrix and subsequently computing the matrix-vector product. Here we present a faster algorithm that computes the final state without having to generate the full transformation matrix first. With this algorithm we bring the time complexity of computing the Gaussian evolution of an $N$-dimensional $M$-mode state from $O(MN^{2M})$ to $O(M(N^2/2)^M)$, which is an exponential improvement in the number of modes. In the special case of high squeezing, the evolved state can be approximated with complexity $O(MN^{M})$. Our new algorithm is differentiable, which means we can use it in conjunction with gradient-based optimizers for circuit optimization tasks. We benchmark our algorithm by optimizing circuits to produce single photons, Gottesman-Kitaev-Preskill states and NOON states, showing that it is up to one order of magnitude faster than the state of the art.

preprint2021arXiv

Natural Gradient Optimization for Optical Quantum Circuits

Optical quantum circuits can be optimized using gradient descent methods, as the gates in a circuit can be parametrized by continuous parameters. However, the parameter space as seen by the cost function is not Euclidean, which means that the Euclidean gradient does not generally point in the direction of steepest ascent. In order to retrieve the steepest ascent direction, in this work we implement Natural Gradient descent in the optical quantum circuit setting, which takes the local metric tensor into account. In particular, we adapt the Natural Gradient approach to a complex-valued parameter space. We then compare the Natural Gradient approach to vanilla gradient descent and to Adam over two state preparation tasks: a single-photon source and a Gottesman-Kitaev-Preskill state source. We observe that the NG approach has a faster convergence (due in part to the possibility of using larger learning rates) and a significantly smoother decay of the cost function throughout the optimization.

preprint2021arXiv

On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization

The low-rank stochastic semidefinite optimization has attracted rising attention due to its wide range of applications. The nonconvex reformulation based on the low-rank factorization, significantly improves the computational efficiency but brings some new challenge to the analysis. The stochastic variance reduced gradient (SVRG) method has been regarded as one of the most effective methods. SVRG in general consists of two loops, where a reference full gradient is first evaluated in the outer loop and then used to yield a variance reduced estimate of the current gradient in the inner loop. Two options have been suggested to yield the output of the inner loop, where Option I sets the output as its last iterate, and Option II yields the output via random sampling from all the iterates in the inner loop. However, there is a significant gap between the theory and practice of SVRG when adapted to the stochastic semidefinite programming (SDP). SVRG practically works better with Option I, while most of existing theoretical results focus on Option II. In this paper, we fill this gap via exploiting a new semi-stochastic variant of the original SVRG with Option I adapted to the semidefinite optimization. Equipped with this, we establish the global linear submanifold convergence (i.e., converging exponentially fast to a submanifold of a global minimum under the orthogonal group action) of the proposed SVRG method, given a provable initialization scheme and under certain smoothness and restricted strongly convex assumptions. Our analysis includes the effects of the mini-batch size and update frequency in the inner loop as well as two practical step size strategies, the fixed and stabilized Barzilai-Borwein step sizes. Some numerical results in matrix sensing demonstrate the efficiency of proposed SVRG method outperforming Option II counterpart as well as others.

preprint2021arXiv

Particle-hole symmetry breaking in a spin-dimer system TlCuCl$_3$ observed at 100 T

The entire magnetization process of TlCuCl$_3$ has been experimentally investigated up to 100 T employing the single-turn technique. The upper critical field $H_{c2}$ is observed to be 86.1 T at 2 K. A convex slope of the $M$-$H$ curve between the lower and upper critical fields ($H_{c1}$ and $H_{c2}$) is clearly observed, which indicates that a particle-hole symmetry is broken in TlCuCl$_3$. By quantum Monte Carlo simulation and the bond-operator theory method, we find that the particle-hole symmetry breaking results from strong inter-dimer interactions.

preprint2021arXiv

Polyimide-Based Flexible Coupled-Coils Design and Load-Shift Keying Analysis

Wireless power transfer using inductive coupling is commonly used for medical implantable devices. The design of the secondary coil on the implantable device is important as it will affect the power transfer efficiency, the size of the implant, and also the data transmission between the implant and the in-vitro controller. In this paper, we present a design of the secondary coil on a polyimide-based flexible substrate to achieve high power transfer efficiency. Load shift keying modulation is used for the data communication between the primary and secondary coils. A thorough analysis is done for the ideal and practical scenario and it shows that a mismatched secondary LC tank will affect the communication range and communication correctness. A solution to achieve robust data transmission is proposed and then verified by SPICE simulations.

preprint2021arXiv

Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

Margin enlargement over training data has been an important strategy since perceptrons in machine learning for the purpose of boosting the robustness of classifiers toward a good generalization ability. Yet Breiman (1999) showed a dilemma that a uniform improvement on margin distribution does NOT necessarily reduces generalization errors. In this paper, we revisit Breiman's dilemma in deep neural networks with recently proposed spectrally normalized margins, from a novel perspective based on phase transitions of normalized margin distributions in training dynamics. Normalized margin distribution of a classifier over the data, can be divided into two parts: low/small margins such as some negative margins for misclassified samples vs. high/large margins for high confident correctly classified samples, that often behave differently during the training process. Low margins for training and test datasets are often effectively reduced in training, along with reductions of training and test errors; while high margins may exhibit different dynamics, reflecting the trade-off between expressive power of models and complexity of data. When data complexity is comparable to the model expressiveness, high margin distributions for both training and test data undergo similar decrease-increase phase transitions during training. In such cases, one can predict the trend of generalization or test error by margin-based generalization bounds with restricted Rademacher complexities, shown in two ways in this paper with early stopping time exploiting such phase transitions. On the other hand, over-expressive models may have both low and high training margins undergoing uniform improvements, with a distinct phase transition in test margin dynamics. This reconfirms the Breiman's dilemma associated with overparameterized neural networks where margins fail to predict overfitting.

preprint2021arXiv

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.

preprint2021arXiv

UPRec: User-Aware Pre-training for Recommender Systems

Existing sequential recommendation methods rely on large amounts of training data and usually suffer from the data sparsity problem. To tackle this, the pre-training mechanism has been widely adopted, which attempts to leverage large-scale data to perform self-supervised learning and transfer the pre-trained parameters to downstream tasks. However, previous pre-trained models for recommendation focus on leverage universal sequence patterns from user behaviour sequences and item information, whereas ignore capturing personalized interests with the heterogeneous user information, which has been shown effective in contributing to personalized recommendation. In this paper, we propose a method to enhance pre-trained models with heterogeneous user information, called User-aware Pre-training for Recommendation (UPRec). Specifically, UPRec leverages the user attributes andstructured social graphs to construct self-supervised objectives in the pre-training stage and proposes two user-aware pre-training tasks. Comprehensive experimental results on several real-world large-scale recommendation datasets demonstrate that UPRec can effectively integrate user information into pre-trained models and thus provide more appropriate recommendations for users.

preprint2020arXiv

$\textit{Ab Initio}$ Mismatched Interface Theory of Graphene on $α$-RuCl$_3$: Doping and Magnetism

Recent developments in twisted and lattice-mismatched bilayers have revealed a rich phase space of van der Waals systems and generated excitement. Among these systems are heterobilayers which can offer new opportunities to control van der Waals systems with strong in plane correlations such as spin-orbit-assisted Mott insulator $α$-RuCl$_3$. Nevertheless, a theoretical $\textit{ab initio}$ framework for mismatched heterobilayers without even approximate periodicity is sorely lacking. We propose a general strategy for calculating electronic properties of such systems, mismatched interface theory (MINT), and apply it to the graphene/$α$-RuCl$_{3}$ (GR/$α$-RuCl$_{3}$) heterostructure. Using MINT, we predict uniform doping of 4.77$\%$ from graphene to $α$-RuCl$_3$ and magnetic interactions in $α$-RuCl$_3$ to shift the system toward the Kitaev point. Hence we demonstrate that MINT can guide targeted materialization of desired model systems and discuss recent experiments on GR/$α$-RuCl$_{3}$ heterostructures.

preprint2020arXiv

$α$ Decay Half-life Estimation and Uncertainty Analysis

The non-parametric bootstrap method is used to evaluate the uncertainties of two $α$ decay formulas, the universal decay law (UDL) and the new Geiger-Nuttall law (NGNL). Such a method can simultaneously obtain the uncertainty of each parameter, the correlation between each pair of parameters, and the total, statistical, and systematic uncertainties of each formula. Both even-even (ee) nuclei and odd-A (oA) nuclei are used in the analysis. The collected data are separated into three parts: ee nuclei, oA nuclei without spin or parity change (oA\_nc), and oA nuclei with spin and/or parity change (oA\_c). Based on the residues between observed data and corresponding calculations, the statistical and systematic uncertainties are decomposed from the total uncertainty, from which one can clarify the effects from the shell structure, pairing, and angular momentum change on describing $α$ decay half-life. If $N > 126$ and $N \leqslant 126$ nuclei are considered together, the systematic uncertainty of residues between observed and predicted half-lives are larger than if those groups are considered separately. Without shell correction term, a much larger systematic uncertainty is found if parameters obtained for $N \leqslant 126$ nuclei are used to describe the half-lives of $N > 126$ nuclei. A global hindrance on the $α$ decay process is found in oA\_nc (oA\_c) nuclei comparing with ee (oA\_nc) nuclei. If parameters obtained from ee (oA\_nc) nuclei are used, the half-lives of oA\_nc (oA\_c) nuclei are generally underestimated with large systematic uncertainties, which can be related to the contribution of pairing effect and angular momentum. The recently observed superallowed decay from $^{104}$Te to $^{100}$Sn is also discussed based on uncertainty analysis. (Abstract is not fully presented because of length limitation)

preprint2020arXiv

A generalized boundary condition applied to Lieb-Schultz-Mattis type ingappabilities and many-body Chern numbers

We introduce a new boundary condition which renders the flux-insertion argument for the Lieb-Schultz-Mattis type theorems in two or higher dimensions free from the specific choice of system sizes. It also enables a formulation of the Lieb-Schultz-Mattis type theorems in arbitrary dimensions in terms of the anomaly in field theories of $1+1$ dimensions with a bulk correspondence as a BF-theory in 2+1 dimensions. Furthermore, we apply the anomaly-based formulation to the constraints on a half-filled spinless fermion on a square lattice with $π$ flux, utilizing time-reversal, the magnetic translation and on-site internal $U(N)$ symmetries. This demonstrates the role of time-reversal anomaly on the ingappabilities of a lattice model.

preprint2020arXiv

Accurate many-body electronic structure near the basis set limit: application to the chromium dimer

We describe a method for computing near-exact energies for correlated systems with large Hilbert spaces. The method efficiently identifies the most important basis states (Slater determinants) and performs a variational calculation in the subspace spanned by these determinants. A semistochastic approach is then used to add a perturbative correction to the variational energy to compute the total energy. The size of the variational space is progressively increased until the total energy converges to within the desired tolerance. We demonstrate the power of the method by computing a near-exact potential energy curve (PEC) for a very challenging molecule -- the chromium dimer.

preprint2020arXiv

Boosting Semantic Human Matting with Coarse Annotations

Semantic human matting aims to estimate the per-pixel opacity of the foreground human regions. It is quite challenging and usually requires user interactive trimaps and plenty of high quality annotated data. Annotating such kind of data is labor intensive and requires great skills beyond normal users, especially considering the very detailed hair part of humans. In contrast, coarse annotated human dataset is much easier to acquire and collect from the public dataset. In this paper, we propose to use coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input. Specifically, we train a mask prediction network to estimate the coarse semantic mask using the hybrid data, and then propose a quality unification network to unify the quality of the previous coarse mask outputs. A matting refinement network takes in the unified mask and the input image to predict the final alpha matte. The collected coarse annotated dataset enriches our dataset significantly, allows generating high quality alpha matte for real images. Experimental results show that the proposed method performs comparably against state-of-the-art methods. Moreover, the proposed method can be used for refining coarse annotated public dataset, as well as semantic segmentation methods, which reduces the cost of annotating high quality human data to a great extent.

preprint2020arXiv

Chemistry of the spin-1/2 kagome Heisenberg antiferromagnet

We believe that a necessary first step in understanding the ground state properties of the spin-${\scriptstyle\frac{1}{2}}$ kagome Heisenberg antiferromagnet is a better understanding of this model's very large number of low energy singlet states. A description of the low energy states that is both accurate and amenable for numerical work may ultimately prove to have greater value than knowing only what these properties are, in particular when these turn on the delicate balance of many small energies. We demonstrate how this program would be implemented using the basis of spin-singlet dimerized states, though other bases that have been proposed may serve the same purpose. The quality of a basis is evaluated by its participation in all the low energy singlets, not just the ground state. From an experimental perspective, and again in light of the small energy scales involved, methods that can deliver all the low energy states promise more robust predictions than methods that only refine a fraction of these states.

preprint2020arXiv

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.

preprint2020arXiv

Efficient Estimation For The Cox Proportional Hazards Cure Model

While analysing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using cure model which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package (Cai et al., 2012) and we compare the results with the output obtained from SMCURE package.

preprint2020arXiv

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces. Recent methods address this challenge through the use of largely unstructured neural networks that effectively distill conditional mapping and priors over 3D shape. In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes. Following these observations, we first predict orthographic 2.5D visible surface maps (depth, normal and silhouette) from perspective 2D images, and detect global reflective symmetries in this data; second, we predict the back facing depth and normal maps using as input the front maps and, when available, the symmetric reflections of these maps; and finally, we reconstruct a 3D mesh from the union of these maps using a surface reconstruction method best suited for this data. Our experiments demonstrate that our framework outperforms state-of-the art approaches for 3D shape reconstructions from 2D and 2.5D data in terms of input fidelity and details preservation. Specifically, we achieve 12% better performance on average in ShapeNet benchmark dataset, and up to 19% for certain classes of objects (e.g., chairs and vessels).

preprint2020arXiv

Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect

Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer between domains and tasks, which can potentially alleviate the data sparsity problem in recommender systems. In this survey, we first provide a review of recommender systems with pre-training. In addition, we show the benefits of pre-training to recommender systems through experiments. Finally, we discuss several promising directions for future research for recommender systems with pre-training.

preprint2020arXiv

Large anomalous Hall effect in a hexagonal ferromagnetic Fe5Sn3 single crystal

In this paper, we report an experimental observation of the large anomalous Hall effect (AHE) in a hexagonal ferromagnetic Fe5Sn3 single crystal with current along the b axis and a magnetic field normal to the bc plane. The intrinsic contribution of the anomalous Hall conductance sigma_AH^int was approximately 613 Ω-1 cm-1, which was more than 3 times the maximum value in the frustrated kagome magnet Fe3Sn2 and nearly independent of the temperature over a wide range between 5 and 350 K. The analysis results revealed that the large AHE was dominated by a common, intrinsic term, while the extrinsic contribution, i.e., the skew scattering and side jump, turned out to be small. In addition to the large AHE, it was found the types of majority carriers changed at approximately 275 and 30 K, consistent with the critical temperatures of the spin reorientation. These findings suggest that the hexagonal ferromagnetic Fe5Sn3 single crystal is an excellent candidate to use for the study of the topological features in ferromagnets.

preprint2020arXiv

Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of finding the needle in a haystack

The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}(d^{2.5})$, which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}(d^{4})$. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to $\mathcal{O}(d^{4.5})$.

preprint2020arXiv

Leveraging both Lesion Features and Procedural Bias in Neuroimaging: An Dual-Task Split dynamics of inverse scale space

The prediction and selection of lesion features are two important tasks in voxel-based neuroimage analysis. Existing multivariate learning models take two tasks equivalently and optimize simultaneously. However, in addition to lesion features, we observe that there is another type of feature, which is commonly introduced during the procedure of preprocessing steps, which can improve the prediction result. We call such a type of feature as procedural bias. Therefore, in this paper, we propose that the features/voxels in neuroimage data are consist of three orthogonal parts: lesion features, procedural bias, and null features. To stably select lesion features and leverage procedural bias into prediction, we propose an iterative algorithm (termed GSplit LBI) as a discretization of differential inclusion of inverse scale space, which is the combination of Variable Splitting scheme and Linearized Bregman Iteration (LBI). Specifically, with a variable the splitting term, two estimators are introduced and split apart, i.e. one is for feature selection (the sparse estimator) and the other is for prediction (the dense estimator). Implemented with Linearized Bregman Iteration (LBI), the solution path of both estimators can be returned with different sparsity levels on the sparse estimator for the selection of lesion features. Besides, the dense the estimator can additionally leverage procedural bias to further improve prediction results. To test the efficacy of our method, we conduct experiments on the simulated study and Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The validity and the benefit of our model can be shown by the improvement of prediction results and the interpretability of visualized procedural bias and lesion features.

preprint2020arXiv

Nonlinear parameter-gauge coupling approach to generalization of generalized Thouless pumps and $-1$-form anomaly

We study the nontrivial topology of the parameter space of general $U(1)$-symmetric fermionic non-degenerately gapped system and its consequences on the transport properties in arbitrary dimensions. By a nonlinear parameter-gauge topological response theory, we find that such nontrivial topology can impose quantization constraints on the charge transport in the presence of background fluxes or, more generally, instantons in general dimensions and our result generalizes the Thouless pump and its higher dimensional generalizations. We also show that these nontrivial transport properties are related to an unconventional quantum anomaly, which generalizes $-1$-form anomalies. This anomaly imposes non-perturbative ingappabilities of various types of spatial interfaces or time-dependent system evolution.

preprint2020arXiv

Self-controlled growth of highly uniform Ge/Si hut wires for scalable qubit devices

Semiconductor nanowires have been playing a crucial role in the development of nanoscale devices for the realization of spin qubits, Majorana fermions, single photon emitters, nanoprocessors, etc. The monolithic growth of site-controlled nanowires is a prerequisite towards the next generation of devices that will require addressability and scalability. Here, combining top-down nanofabrication and bottom-up self-assembly, we report on the growth of Ge wires on pre-patterned Si (001) substrates with controllable position, distance, length and structure. This is achieved by a novel growth process which uses a SiGe strain-relaxation template and can be generalized to other material combinations. Transport measurements show an electrically tunable spin-orbit coupling, with a spin-orbit length similar to that of III-V materials. Also, capacitive coupling between closely spaced wires is observed, which underlines their potential as a host for implementing two qubit gates. The reported results open a path towards scalable qubit devices with Si compatibility.

preprint2020arXiv

Two-photon interference: the Hong-Ou-Mandel effect

Nearly 30 years ago, two-photon interference was observed, marking the beginning of a new quantum era. Indeed, two-photon interference has no classical analogue, giving it a distinct advantage for a range of applications. The peculiarities of quantum physics may now be used to our advantage to outperform classical computations, securely communicate information, simulate highly complex physical systems and increase the sensitivity of precise measurements. This separation from classical to quantum physics has motivated physicists to study two-particle interference for both fermionic and bosonic quantum objects. So far, two-particle interference has been observed with massive particles, among others, such as electrons and atoms, in addition to plasmons, demonstrating the extent of this effect to larger and more complex quantum systems. A wide array of novel applications to this quantum effect is to be expected in the future. This review will thus cover the progress and applications of two-photon (two-particle) interference over the last three decades.

preprint2020arXiv

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP

preprint2019arXiv

Direct comparison of many-body methods for realistic electronic Hamiltonians

A large collaboration carefully benchmarks 20 first principles many-body electronic structure methods on a test set of 7 transition metal atoms, and their ions and monoxides. Good agreement is attained between the 3 systematically converged methods, resulting in experiment-free reference values. These reference values are used to assess the accuracy of modern emerging and scalable approaches to the many-electron problem. The most accurate methods obtain energies indistinguishable from experimental results, with the agreement mainly limited by the experimental uncertainties. Comparison between methods enables a unique perspective on calculations of many-body systems of electrons.

preprint2019arXiv

Observation of Magnetic Skyrmion Bubbles in a van der Waals ferromagnet Fe3GeTe2

Two-dimensional (2D) van der Waals (vdW) magnetic materials have recently been introduced as a new horizon in materials science and enable the potential applications for next-generation spintronic devices. Here, in this communication, the observations of stable Bloch-type magnetic skyrmions in single crystals of 2D vdW Fe3GeTe2 (FGT) are reported by using in-situ Lorentz transmission electron microscopy (TEM). We find the ground-state magnetic stripe domains in FGT transform into skyrmion bubbles when an external magnetic field is applied perpendicularly to the (001) thin plate with temperatures below the Curie-temperature TC. Most interestingly, a hexagonal lattice of skyrmion bubbles is obtained via field cooling manipulation with magnetic field applied along the [001] direction. Owing to their topological stability, the skyrmion bubble lattices are stable to large field-cooling tilted angles and further reproduced by utilizing the micromagnetic simulations. These observations directly demonstrate that the 2D vdW FGT possesses a rich variety of topological spin textures, being of a great promise candidate for future applications in the field of spintronics.

preprint2018arXiv

MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning

It is one typical and general topic of learning a good embedding model to efficiently learn the representation coefficients between two spaces/subspaces. To solve this task, $L_{1}$ regularization is widely used for the pursuit of feature selection and avoiding overfitting, and yet the sparse estimation of features in $L_{1}$ regularization may cause the underfitting of training data. $L_{2}$ regularization is also frequently used, but it is a biased estimator. In this paper, we propose the idea that the features consist of three orthogonal parts, \emph{namely} sparse strong signals, dense weak signals and random noise, in which both strong and weak signals contribute to the fitting of data. To facilitate such novel decomposition, \emph{MSplit} LBI is for the first time proposed to realize feature selection and dense estimation simultaneously. We provide theoretical and simulational verification that our method exceeds $L_{1}$ and $L_{2}$ regularization, and extensive experimental results show that our method achieves state-of-the-art performance in the few-shot and zero-shot learning.

preprint2016arXiv

A Tutorial on Libra: R package for the Linearized Bregman Algorithm in High Dimensional Statistics

The R package, Libra, stands for the LInearized BRegman Al- gorithm in high dimensional statistics. The Linearized Bregman Algorithm is a simple iterative procedure to generate sparse regularization paths of model estimation, which are rstly discovered in applied mathematics for image restoration and particularly suitable for parallel implementation in large scale problems. The limit of such an algorithm is a sparsity-restricted gradient descent ow, called the Inverse Scale Space, evolving along a par- simonious path of sparse models from the null model to over tting ones. In sparse linear regression, the dynamics with early stopping regularization can provably meet the unbiased Oracle estimator under nearly the same condition as LASSO, while the latter is biased. Despite their successful applications, statistical consistency theory of such dynamical algorithms remains largely open except for some recent progress on linear regression. In this tutorial, algorithmic implementations in the package are discussed for several widely used sparse models in statistics, including linear regression, logistic regres- sion, and several graphical models (Gaussian, Ising, and Potts). Besides the simulation examples, various application cases are demonstrated, with real world datasets from diabetes, publications of COPSS award winners, as well as social networks of two Chinese classic novels, Journey to the West and Dream of the Red Chamber.

preprint2016arXiv

Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of the graph as a measurement for estimator stability (informativeness), we provide a new estimate of the Fiedler value for these two random graph models. In the asymptotic limit as the number of vertices tends to infinity, we prove the validity of the estimate. Based on our findings, for a small number of items to be compared, we recommend a two-stage sampling strategy where a greedy sampling method is used initially and random sampling \emph{without} replacement is used in the second stage. When a large number of items is to be compared, we recommend random sampling with replacement as this is computationally inexpensive and trivially parallelizable. Experiments on synthetic and real-world datasets support our analysis.

preprint2016arXiv

False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a statistical framework to model and detect annotator's position bias in order to control the false discovery rate (FDR) without a prior knowledge on the amount of biased annotators - the expected fraction of false discoveries among all discoveries being not too high, in order to assure that most of the discoveries are indeed true and replicable. The key technical development relies on some new knockoff filters adapted to our problem and new algorithms based on the Inverse Scale Space dynamics whose discretization is potentially suitable for large scale crowdsourcing data analysis. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a useful tool for quantitatively studying annotator's abnormal behavior in crowdsourcing data arising from machine learning, sociology, computer vision, multimedia, etc.

preprint2016arXiv

Optical frequency divider with division uncertainty at the 10^(-21) level

Optical clocks with unprecedented accuracy of 10^(-18) will lead to innovations in many research areas. All the applications of optical clocks rely on the ability of precisely converting the frequency from one optical clock to another, or particularly to the frequencies in the fiber telecom band for long-distance transmission. Here, we report a low-noise, high precision optical frequency divider. It can realize accurate optical frequency conversion as well as enable precise measurement of optical frequency ratios. By comparing against the frequency ratio between the fundamental and the second harmonic of a 1064 nm laser rather than a second similar system, the optical frequency divider is demonstrated to have a frequency division instability of 6e-19 at 1 s and a fractional frequency division uncertainty of 1.4e-21, nearly three orders of magnitude better than the most accurate optical clocks. It allows optical clocks to be accessible to many precision measurement applications.

preprint2016arXiv

Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation

In crowdsourced preference aggregation, it is often assumed that all the annotators are subject to a common preference or utility function which generates their comparison behaviors in experiments. However, in reality annotators are subject to variations due to multi-criteria, abnormal, or a mixture of such behaviors. In this paper, we propose a parsimonious mixed-effects model based on HodgeRank, which takes into account both the fixed effect that the majority of annotators follows a common linear utility model, and the random effect that a small subset of annotators might deviate from the common significantly and exhibits strongly personalized preferences. HodgeRank has been successfully applied to subjective quality evaluation of multimedia and resolves pairwise crowdsourced ranking data into a global consensus ranking and cyclic conflicts of interests. As an extension, our proposed methodology further explores the conflicts of interests through the random effect in annotator specific variations. The key algorithm in this paper establishes a dynamic path from the common utility to individual variations, with different levels of parsimony or sparsity on personalization, based on newly developed Linearized Bregman Algorithms with Inverse Scale Space method. Finally the validity of the methodology are supported by experiments with both simulated examples and three real-world crowdsourcing datasets, which shows that our proposed method exhibits better performance (i.e. smaller test error) compared with HodgeRank due to its parsimonious property.

preprint2016arXiv

Sparse Recovery via Differential Inclusions

In this paper, we recover sparse signals from their noisy linear measurements by solving nonlinear differential inclusions, which is based on the notion of inverse scale space (ISS) developed in applied mathematics. Our goal here is to bring this idea to address a challenging problem in statistics, \emph{i.e.} finding the oracle estimator which is unbiased and sign-consistent using dynamics. We call our dynamics \emph{Bregman ISS} and \emph{Linearized Bregman ISS}. A well-known shortcoming of LASSO and any convex regularization approaches lies in the bias of estimators. However, we show that under proper conditions, there exists a bias-free and sign-consistent point on the solution paths of such dynamics, which corresponds to a signal that is the unbiased estimate of the true signal and whose entries have the same signs as those of the true signs, \emph{i.e.} the oracle estimator. Therefore, their solution paths are regularization paths better than the LASSO regularization path, since the points on the latter path are biased when sign-consistency is reached. We also show how to efficiently compute their solution paths in both continuous and discretized settings: the full solution paths can be exactly computed piece by piece, and a discretization leads to \emph{Linearized Bregman iteration}, which is a simple iterative thresholding rule and easy to parallelize. Theoretical guarantees such as sign-consistency and minimax optimal $l_2$-error bounds are established in both continuous and discrete settings for specific points on the paths. Early-stopping rules for identifying these points are given. The key treatment relies on the development of differential inequalities for differential inclusions and their discretizations, which extends the previous results and leads to exponentially fast recovering of sparse signals before selecting wrong ones.

preprint2015arXiv

Aharonov-Bohm phases in a quantum LC circuit

We study novel types of contributions to the partition function of the Maxwell system defined on a small compact manifold. These contributions, often not addressed in the perturbative treatment with physical photons, emerge as a result of tunneling transitions between topologically distinct but physically identical vacuum winding states. These new terms give an extra contribution to the Casimir pressure, yet to be measured. We argue that this effect is highly sensitive to a small external electric field, which should be contrasted with the conventional Casimir effect where the vacuum photons are essentially unaffected by any external field. Furthermore, photons will be emitted from the vacuum in response to a time-dependent electric field, similar to the dynamical Casimir effect in which real particles are radiated from the vacuum due to the time-dependent boundary conditions. We also propose an experimental setup using a quantum LC circuit to detect this novel effect. We expect physical electric charges to appear on the capacitor plates when the system dimension is such that coherent Aharonov-Bohm phases can be maintained over macroscopically large distances.

preprint2015arXiv

Determining phase-space properties of the IHEP RFQ output beam using the RMS beam widths from wire-scanners

A beam line is built after the IHEP RFQ for halo study. To determine transverse emittance and ellipse parameters of the RFQ output beam, beam size data obtained from the first two of 14 wire scanners are employed. By using the transfer matrix method and the least square method, a set of linear equations were set up and solved. The solutions were then applied as initial beam parameters in multi-particle simulations to check the method of calculation. It is shown that difference between the simulated RMS beam size and the measured one at the measurement location is less than 7%, which is acceptable in our experiments.

preprint2015arXiv

Geometric Tight Frame based Stylometry for Art Authentication of van Gogh Paintings

This paper is about authenticating genuine van Gogh paintings from forgeries. The authentication process depends on two key steps: feature extraction and outlier detection. In this paper, a geometric tight frame and some simple statistics of the tight frame coefficients are used to extract features from the paintings. Then a forward stage-wise rank boosting is used to select a small set of features for more accurate classification so that van Gogh paintings are highly concentrated towards some center point while forgeries are spread out as outliers. Numerical results show that our method can achieve 86.08% classification accuracy under the leave-one-out cross-validation procedure. Our method also identifies five features that are much more predominant than other features. Using just these five features for classification, our method can give 88.61% classification accuracy which is the highest so far reported in literature. Evaluation of the five features is also performed on two hundred datasets generated by bootstrap sampling with replacement. The median and the mean are 88.61% and 87.77% respectively. Our results show that a small set of statistics of the tight frame coefficients along certain orientations can serve as discriminative features for van Gogh paintings. It is more important to look at the tail distributions of such directional coefficients than mean values and standard deviations. It reflects a highly consistent style in van Gogh's brushstroke movements, where many forgeries demonstrate a more diverse spread in these features.

preprint2015arXiv

High Electron Mobility and Large Magnetoresistance in the Half-Heusler Semimetal LuPtBi

Materials with high carrier mobility showing large magnetoresistance (MR) have recently received much attention because of potential applications in future high-performance magneto-electric devices. Here, we report on the discovery of an electron-hole-compensated half-Heusler semimetal LuPtBi that exhibits an extremely high electron mobility of up to 79000 cm2/Vs with a non-saturating positive MR as large as 3200% at 2 K. Remarkably, the mobility at 300 K is found to exceed 10500 cm2/Vs, which is among the highest values reported in three-dimensional bulk materials thus far. The clean Shubnikov-de Haas quantum oscillation observed at low temperatures and the first-principles calculations together indicate that the high electron mobility is due to a rather small effective carrier mass caused by the distinctive band structure of the crystal. Our finding provide a new approach for finding large, high-mobility MR materials by designing an appropriate Fermi surface topology starting from simple electron-hole-compensated semimetals.

preprint2015arXiv

Low-temperature linear transport of two-dimensional massive Dirac fermions in silicene: residual conductivity and spin/valley Hall effects

Considering finite-temperature screened electron-impurity scattering, we present a kinetic equation approach to investigate transport properties of two-dimensional massive fermions in silicene. We find that the longitudinal conductivity is always nonvanishing when chemical potential lies within the energy gap. This residual conductivity arises from interband correlation and strongly depends on strength of electron-impurity scattering. We also clarify that the electron-impurity interaction makes substantial contributions to the spin- and valley-Hall conductivities, which, however, are almost independent of impurity density. The dependencies of longitudinal conductivity as well as of spin- and valley-Hall conductivities on chemical potential, on temperature, and on gap energy are analyzed.

preprint2015arXiv

Mixed and missing data: a unified treatment with latent graphical models

We propose to learn latent graphical models when data have mixed variables and missing values. This model could be used for further data analysis, including regression, classification, ranking etc. It also could be used for imputing missing values. We specify a latent Gaussian model for the data, where the categorical variables are generated by discretizing an unobserved variable and the latent variables are multivariate Gaussian. The observed data consists of two parts: observed Gaussian variables and observed categorical variables, where the latter part is considered as partially missing Gaussian variables. We use the Expectation-Maximization algorithm to fit the model. To prevent overfitting we use sparse inverse covariance estimation to obtain sparse estimate of the latent covariance matrix, equivalently, the graphical model. The fitted model then could be used for problems including re- gression, classification and ranking. Such an approach is applied to a medical data set where our method outperforms the state-of-the-art methods. Simulation studies and real data results suggest that our proposed model performs better than random forest in terms of prediction error when the model is correctly specified, and is a better imputation method than hot deck imputation even if the model is not correctly specified.

preprint2015arXiv

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels because human annotators are much better at ranking two images/videos (e.g. which one is more interesting) than giving an absolute value to each of them separately. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. Differing from existing methods, the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Extensive experiments on various benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-arts alternatives.

preprint2014arXiv

Black Silicon Solar Thin-film Microcells Integrating Top Nanocone Structures for Broadband and Omnidirectional Light-Trapping

Recently developed classes of monocrystalline silicon solar microcells (u-cell) can be assembled into modules with characteristics (i.e., mechanically flexible forms, compact concentrator designs, and high-voltage outputs) that would be impossible to achieve using conventional, wafer-based approaches. In this paper, we describe a highly dense, uniform and non-periodic nanocone forest structure of black silicon (bSi) created on optically-thin (30 um) u-cells for broadband and omnidirectional light-trapping with a lithography-free and high-throughput plasma texturizing process. With optimized plasma etching conditions and a silicon nitride passivation layer, black silicon u-cells, when embedded in a polymer waveguiding layer, display dramatic increases of as much as 65.7% in short circuit current, as compared to a bare silicon device. The conversion efficiency increases from 8% to 11.5% with a small drop in open circuit voltage and fill factor.

preprint2014arXiv

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Outlier detection is an integral part of robust evaluation for crowdsourceable Quality of Experience (QoE) and has attracted much attention in recent years. In QoE for multimedia, outliers happen because of different test conditions, human errors, abnormal variations in context, {etc}. In this paper, we propose a simple yet effective algorithm for outlier detection and robust QoE evaluation named iterative Least Trimmed Squares (iLTS). The algorithm assigns binary weights to samples, i.e., 0 or 1 indicating if a sample is an outlier, then the outlier-trimmed subset least squares solutions give robust ranking scores. An iterative optimization is carried alternatively between updating weights and ranking scores which converges to a local optimizer in finite steps. In our test setting, iLTS is up to 190 times faster than LASSO-based methods with a comparable performance. Moreover, a varied version of this method shows adaptation in outlier detection, which provides an automatic detection to determine whether a data sample is an outlier without \emph{a priori} knowledge about the amount of the outliers. The effectiveness and efficiency of iLTS are demonstrated on both simulated examples and real-world applications. A Matlab package is provided to researchers exploiting crowdsourcing paired comparison data for robust ranking.

preprint2014arXiv

Regression Analysis with Response-biased Sampling

Response-biased sampling, in which samples are drawn from a popula- tion according to the values of the response variable, is common in biomedical, epidemiological, economic and social studies. In particular, the complete obser- vations in data with censoring, truncation or missing covariates can be regarded as response-biased sampling under certain conditions. This paper proposes to use transformation models, known as the generalized accelerated failure time model in econometrics, for regression analysis with response-biased sampling. With unknown error distribution, the transformation models are broad enough to cover linear re- gression models, the Cox's model and the proportional odds model as special cases. To the best of our knowledge, except for the case-control logistic regression, there is no report in the literature that a prospective estimation approach can work for biased sampling without any modification. We prove that the maximum rank corre- lation estimation is valid for response-biased sampling and establish its consistency and asymptotic normality. Unlike the inverse probability methods, the proposed method of estimation does not involve the sampling probabilities, which are often difficult to obtain in practice. Without the need of estimating the unknown trans- formation function or the error distribution, the proposed method is numerically easy to implement with the Nelder-Mead simplex algorithm, which does not require convexity or continuity. We propose an inference procedure using random weight- ing to avoid the complication of density estimation when using the plug-in rule for variance estimation. Numerical studies with supportive evidence are presented. Applications are illustrated with the Forbes Global 2000 data and the Stanford heart transplant data.

preprint2013arXiv

Hierarchical Nystrom Methods for Constructing Markov State Models for Conformational Dynamics

Markov state models (MSMs) have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable macro state model by lumping together rapidly mixing microstates into larger, metastable aggregates. However, finite sampling often results in the creation of many poorly sampled microstates. During coarse-graining, these states are mistakenly identified as being kinetically important because transitions to/from them appear to be slow. In this paper we propose a formalism based on an algebraic principle for matrix approximation, i.e. the Nystrom method, to deal with such poorly sampled microstates. Our scheme builds a hierarchy of microstates from high to low populations and progressively applies spectral clustering on sets of microstates within each level of the hierarchy. It helps spectral clustering identify metastable aggregates with highly populated microstates rather than being distracted by lowly populated states. We demonstrate the ability of this algorithm to discover the major metastable states on two model systems, the alanine dipeptide and TrpZip2.

preprint2013arXiv

Online Learning as Stochastic Approximation of Regularization Paths

In this paper, an online learning algorithm is proposed as sequential stochastic approximation of a regularization path converging to the regression function in reproducing kernel Hilbert spaces (RKHSs). We show that it is possible to produce the best known strong (RKHS norm) convergence rate of batch learning, through a careful choice of the gain or step size sequences, depending on regularity assumptions on the regression function. The corresponding weak (mean square distance) convergence rate is optimal in the sense that it reaches the minimax and individual lower rates in the literature. In both cases we deduce almost sure convergence, using Bernstein-type inequalities for martingales in Hilbert spaces. To achieve this we develop a bias-variance decomposition similar to the batch learning setting; the bias consists in the approximation and drift errors along the regularization path, which display the same rates of convergence, and the variance arises from the sample error analysed as a reverse martingale difference sequence. The rates above are obtained by an optimal trade-off between the bias and the variance.

preprint2012arXiv

MaTrust: An Effective Multi-Aspect Trust Inference Model

Trust is a fundamental concept in many real-world applications such as e-commerce and peer-to-peer networks. In these applications, users can generate local opinions about the counterparts based on direct experiences, and these opinions can then be aggregated to build trust among unknown users. The mechanism to build new trust relationships based on existing ones is referred to as trust inference. State-of-the-art trust inference approaches employ the transitivity property of trust by propagating trust along connected users. In this paper, we propose a novel trust inference model (MaTrust) by exploring an equally important property of trust, i.e., the multi-aspect property. MaTrust directly characterizes multiple latent factors for each trustor and trustee from the locally-generated trust relationships. Furthermore, it can naturally incorporate prior knowledge as specified factors. These factors in turn serve as the basis to infer the unseen trustworthiness scores. Experimental evaluations on real data sets show that the proposed MaTrust significantly outperforms several benchmark trust inference models in both effectiveness and efficiency.

preprint2012arXiv

Study of point spread in aberration-corrected high-resolution transmission electron microscopy

For quantitative electron microscopy high precision position information is necessary so that besides an adequate resolution and sufficiently strong contrast of atoms, small width of peaks which represent atoms in structural images is needed. Size of peak is determined by point spread (PS) of instruments as well as that of atoms when point resolution reach the subangstrom scale and thus PS of instruments is comparable with that of atoms. In this article, relationship between PS with atomic numbers, sample thickness, and spherical aberration coefficients will be studied in both negative Cs imaging (NCSI) and positive Cs imaging (PCSI) modes by means of dynamical image simulation. Through comparing the peak width with different thickness and different values of spherical aberration, NCSI mode is found to be superior to PCSI considering smaller peak width in the structural image.

preprint2012arXiv

The Landscape of Complex Networks

Topological landscape is introduced for networks with functions defined on the nodes. By extending the notion of gradient flows to the network setting, critical nodes of different indices are defined. This leads to a concise and hierarchical representation of the network. Persistent homology from computational topology is used to design efficient algorithms for performing such analysis. Applications to some examples in social and biological networks are demonstrated, which show that critical nodes carry important information about structures and dynamics of such networks.

preprint2011arXiv

Array independent MIMO channel models with analytical characteristics

The conventional analytical channel models for multiple-input multiple-output (MIMO) wireless radio channels are array dependent. In this paper, we present several array independent MIMO channel models that inherit the essence of analytical models. The key idea is to decompose the physical scattering channel into two parts using the manifold decomposition technique: one is the wavefield independent sampling matrices depending on the antenna arrays only; the other is the array independent physical channel that can be individually modeled in an analytical manner. Based on the framework, we firstly extend the conventional virtual channel representation (VCR), which is restricted to uniform linear arrays (ULAs) so far, to a general version applicable to arbitrary array configurations. Then, we present two array independent stochastic MIMO channel models based on the proposed new VCR as well as the Weichselberger model. These two models are good at angular power spectrum (APS) estimation and capacity prediction, respectively. Finally, the impact of array characteristics on channel capacity is separately investigated by studying the condition number of the array steering matrix at fixed angles, and the results agree well with existing conclusions. Numerical results are presented for model validation and comparison.

preprint2011arXiv

Compressive Network Analysis

Modern data acquisition routinely produces massive amounts of network data. Though many methods and models have been proposed to analyze such data, the research of network data is largely disconnected with the classical theory of statistical learning and signal processing. In this paper, we present a new framework for modeling network data, which connects two seemingly different areas: network data analysis and compressed sensing. From a nonparametric perspective, we model an observed network using a large dictionary. In particular, we consider the network clique detection problem and show connections between our formulation with a new algebraic tool, namely Randon basis pursuit in homogeneous spaces. Such a connection allows us to identify rigorous recovery conditions for clique detection problems. Though this paper is mainly conceptual, we also develop practical approximation algorithms for solving empirical problems and demonstrate their usefulness on real-world datasets.

preprint2010arXiv

(Quasi-)Poisson enveloping algebras

We introduce the quasi-Poisson enveloping algebra and Poisson enveloping algebra for a non-commutative Poisson algebra. We prove that for a non-commutative Poisson algebra, the category of quasi-Poisson modules is equivalent to the category of left modules over its quasi-Poisson enveloping algebra, and the category of Poisson modules is equivalent to the category of left modules over its Poisson enveloping algebra.

Yuan Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

81 published item(s)

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Uncertainty Quantification for LLM-based Code Generation

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

Duality viewpoint of criticality

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

A Roadmap for Big Model

A special cross-tie domain wall in helimagnet

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

Detecting Topology Attacks against Graph Neural Networks

Equiangular lines with a fixed angle

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces

Fine-Grained Scene Graph Generation with Data Transfer

From Cascades to $J$-holomorphic Curves and Back

Gappability Index for Quantum Many-Body Systems

Gate-Level Side-Channel Leakage Assessment with Architecture Correlation Analysis

Generative Adversarial Networks for Robust Cryo-EM Image Denoising

Geometric approach to Lieb-Schultz-Mattis theorem without translation symmetry under inversion or rotation symmetry

Observation of short-period helical spin order and magnetic transition in a non-chiral centrosymmetric helimagnet

On Private Online Convex Optimization: Optimal Algorithms in $\ell_p$-Geometry and High Dimensional Contextual Bandits

Prompt Tuning for Discriminative Pre-trained Language Models

Structure-Aware Flow Generation for Human Body Reshaping

Tracking the nematicity in cuprate superconductors: a resistivity study under uniaxial pressure

Unsupervised Domain Adaptation through Shape Modeling for Medical Image Segmentation

Evaluating Visual Properties via Robust HodgeRank

Fast differentiable evolution of quantum states under Gaussian transformations

Natural Gradient Optimization for Optical Quantum Circuits

On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization

Particle-hole symmetry breaking in a spin-dimer system TlCuCl$_3$ observed at 100 T

Polyimide-Based Flexible Coupled-Coils Design and Load-Shift Keying Analysis

Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

UPRec: User-Aware Pre-training for Recommender Systems

$\textit{Ab Initio}$ Mismatched Interface Theory of Graphene on $α$-RuCl$_3$: Doping and Magnetism

$α$ Decay Half-life Estimation and Uncertainty Analysis

A generalized boundary condition applied to Lieb-Schultz-Mattis type ingappabilities and many-body Chern numbers

Accurate many-body electronic structure near the basis set limit: application to the chromium dimer

Boosting Semantic Human Matting with Coarse Annotations

Chemistry of the spin-1/2 kagome Heisenberg antiferromagnet

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Efficient Estimation For The Cox Proportional Hazards Cure Model

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect

Large anomalous Hall effect in a hexagonal ferromagnetic Fe5Sn3 single crystal

Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of finding the needle in a haystack

Leveraging both Lesion Features and Procedural Bias in Neuroimaging: An Dual-Task Split dynamics of inverse scale space

Nonlinear parameter-gauge coupling approach to generalization of generalized Thouless pumps and $-1$-form anomaly

Self-controlled growth of highly uniform Ge/Si hut wires for scalable qubit devices

Two-photon interference: the Hong-Ou-Mandel effect

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

Direct comparison of many-body methods for realistic electronic Hamiltonians

Observation of Magnetic Skyrmion Bubbles in a van der Waals ferromagnet Fe3GeTe2

MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning

A Tutorial on Libra: R package for the Linearized Bregman Algorithm in High Dimensional Statistics

Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

Optical frequency divider with division uncertainty at the 10^(-21) level

Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation

Sparse Recovery via Differential Inclusions

Aharonov-Bohm phases in a quantum LC circuit

Determining phase-space properties of the IHEP RFQ output beam using the RMS beam widths from wire-scanners

Geometric Tight Frame based Stylometry for Art Authentication of van Gogh Paintings

High Electron Mobility and Large Magnetoresistance in the Half-Heusler Semimetal LuPtBi

Low-temperature linear transport of two-dimensional massive Dirac fermions in silicene: residual conductivity and spin/valley Hall effects

Mixed and missing data: a unified treatment with latent graphical models

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

Black Silicon Solar Thin-film Microcells Integrating Top Nanocone Structures for Broadband and Omnidirectional Light-Trapping

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Regression Analysis with Response-biased Sampling

Hierarchical Nystrom Methods for Constructing Markov State Models for Conformational Dynamics