Researcher profile

jian Liang

jian Liang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as books, academic papers, and multi-turn dialogues. Under the same vanilla GRPO setup, our dataset alone outperforms the closed-source QwenLong-L1.5 dataset. Moreover, our Qwen3-30B-A3B model trained on this data delivers long-context performance comparable to DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507, suggesting that broader coverage and greater reward diversity substantially benefit long-context capability improvement. (2) TMN-Reweight for heterogeneous multitask optimization. To address optimization challenges from heterogeneous rewards, we propose TMN-Reweight, which combines task-level mean normalization for cross-task reward scale alignment with difficulty-adaptive weighting for more reliable advantage estimation. TMN-Reweight further improves average performance over vanilla GRPO, with general capabilities preserved or improved across reported evaluations.

preprint2025arXiv

$η$ and $η'$ mesons from $N_f = 2+1$ lattice QCD at the physical point using topological charge operators

By fitting the two-point correlation functions of topological charge density operators calculated on two $2+1$-flavor gauge ensembles with physical pion mass, we determine both the $η$ and $η'$ masses and also the mixing angle to be $m_η= 0.505(72)(75)$ GeV, $m_{η'}=0.952(47)(40)$ GeV, and $θ_1 = -8.9(2.1)(1.8)^\circ$, respectively, where the first error is the statistical uncertainty and the second one is the systematic uncertainty. This is the first extraction of both $η/η'$ masses and the mixing angle $θ_1$ using topological charge operators. Compared with previous studies using quark bilinear operators, the error of the $η$ mass is relatively large, but the mixing angle has comparable precision. This demonstrates that the topological charge operators are well suited to study the $η$ and $η'$ mesons.

preprint2025arXiv

DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing

Deep Research agents predominantly optimize search policies to maximize retrieval probability. However, we identify a critical bottleneck: the retrieval-utilization gap, where models fail to use gold evidence even after it is retrieved, due to context blindness in noisy environments. To bridge this gap, we propose DeepResearch-Slice, a simple yet effective neuro-symbolic framework. Unlike implicit attention, our approach predicts precise span indices to perform a deterministic hard filter before reasoning. Extensive evaluations across six benchmarks show substantial robustness gains. Applying our method to frozen backbones yields a 73 percent relative improvement, from 19.1 percent to 33.0 percent, effectively mitigating noise without requiring parameter updates to the reasoning model. These results highlight the need for explicit grounding mechanisms in open-ended research.

preprint2023arXiv

Learning Feature Recovery Transformer for Occluded Person Re-identification

One major issue that challenges person re-identification (Re-ID) is the ubiquitous occlusion over the captured persons. There are two main challenges for the occluded person Re-ID problem, i.e., the interference of noise during feature matching and the loss of pedestrian information brought by the occlusions. In this paper, we propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously, which mainly consists of visibility graph matching and feature recovery transformer. To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity. In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its $k$-nearest neighbors in the gallery to recover the complete features. Extensive experiments across different person Re-ID datasets, including occluded, partial and holistic datasets, demonstrate the effectiveness of FRT. Specifically, FRT significantly outperforms state-of-the-art results by at least 6.2\% Rank-1 accuracy and 7.2\% mAP scores on the challenging Occluded-Duke dataset. The code is available at https://github.com/xbq1994/Feature-Recovery-Transformer.

preprint2023arXiv

Nucleon Electric Dipole Moment from the $θ$ Term with Lattice Chiral Fermions

We calculate the nucleon electric dipole moment (EDM) from the $θ$ term with overlap fermions on three domain wall lattices with different sea pion masses at lattice spacing 0.11 fm. Due to the chiral symmetry conserved by the overlap fermions, we have well defined topological charge and chiral limit for the EDM. Thus, the chiral extrapolation can be carried out reliably at nonzero lattice spacings. We use three to four different partially quenched valence pion masses for each sea pion mass and find that the EDM dependence on the valence and sea pion masses behaves oppositely, which can be described by partially quenched chiral perturbation theory. With the help of the cluster decomposition error reduction (CDER) technique, we determine the neutron and proton EDM at the physical pion mass to be $d_{n}=-0.00148\left(14\right)\left(31\right)\barθ$ e$\cdot$fm and $d_{p}=0.0038\left(11\right)\left(8\right)\barθ$ e$\cdot$fm. This work is a clear demonstration of the advantages of using chiral fermions in the nucleon EDM calculation and paves the road to future precise studies of the strong $CP$ violation effects.

preprint2022arXiv

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

In the field of car evaluation, more and more netizens choose to express their opinions on the Internet platform, and these comments will affect the decision-making of buyers and the trend of car word-of-mouth. As an important branch of natural language processing (NLP), sentiment analysis provides an effective research method for analyzing the sentiment types of massive car review texts. However, due to the lexical professionalism and large text noise of review texts in the automotive field, when a general sentiment analysis model is applied to car reviews, the accuracy of the model will be poor. To overcome these above challenges, we aim at the sentiment analysis task of car review texts. From the perspective of word vectors, pre-training is carried out by means of whole word mask of proprietary vocabulary in the automotive field, and then training data is carried out through the strategy of an adversarial training set. Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).

preprint2022arXiv

Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate Prediction

Rich user behavior information is of great importance for capturing and understanding user interest in click-through rate (CTR) prediction. To improve the richness, collecting long-term behaviors becomes a typical approach in academy and industry but at the cost of increasing online storage and latency. Recently, researchers have proposed several approaches to shorten long-term behavior sequence and then model user interests. These approaches reduce online cost efficiently but do not well handle the noisy information in long-term user behavior, which may deteriorate the performance of CTR prediction significantly. To obtain better cost/performance trade-off, we propose a novel Adversarial Filtering Model (ADFM) to model long-term user behavior. ADFM uses a hierarchical aggregation representation to compress raw behavior sequence and then learns to remove useless behavior information with an adversarial filtering mechanism. The selected user behaviors are fed into interest extraction module for CTR prediction. Experimental results on public datasets and industrial dataset demonstrate that our method achieves significant improvements over state-of-the-art models.

preprint2022arXiv

Causality Inspired Representation Learning for Domain Generalization

Domain generalization (DG) is essentially an out-of-distribution problem, aiming to generalize the knowledge learned from multiple source domains to an unseen target domain. The mainstream is to leverage statistical models to model the dependence between data and labels, intending to learn representations independent of domain. Nevertheless, the statistical models are superficial descriptions of reality since they are only required to model dependence instead of the intrinsic causal mechanism. When the dependence changes with the target distribution, the statistic models may fail to generalize. In this regard, we introduce a general structural causal model to formalize the DG problem. Specifically, we assume that each input is constructed from a mix of causal factors (whose relationship with the label is invariant across domains) and non-causal factors (category-independent), and only the former cause the classification judgments. Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms. However, the theoretical idea is far from practical of DG since the required causal/non-causal factors are unobserved. We highlight that ideal causal factors should meet three basic properties: separated from the non-causal ones, jointly independent, and causally sufficient for the classification. Based on that, we propose a Causality Inspired Representation Learning (CIRL) algorithm that enforces the representations to satisfy the above properties and then uses them to simulate the causal factors, which yields improved generalization ability. Extensive experimental results on several widely used datasets verify the effectiveness of our approach.

preprint2022arXiv

Collaboration Equilibrium in Federated Learning

Federated learning (FL) refers to the paradigm of learning models over a collaborative research network involving multiple clients without sacrificing privacy. Recently, there have been rising concerns on the distributional discrepancies across different clients, which could even cause counterproductive consequences when collaborating with others. While it is not necessarily that collaborating with all clients will achieve the best performance, in this paper, we study a rational collaboration called ``collaboration equilibrium'' (CE), where smaller collaboration coalitions are formed. Each client collaborates with certain members who maximally improve the model learning and isolates the others who make little contribution. We propose the concept of benefit graph which describes how each client can benefit from collaborating with other clients and advance a Pareto optimization approach to identify the optimal collaborators. Then we theoretically prove that we can reach a CE from the benefit graph through an iterative graph operation. Our framework provides a new way of setting up collaborations in a research network. Experiments on both synthetic and real world data sets are provided to demonstrate the effectiveness of our method.

preprint2022arXiv

DINE: Domain Adaptation from Single and Multiple Black-box Predictors

To ease the burden of labeling, unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite impressive progress, prior methods always need to access the raw source data and develop data-dependent alignment approaches to recognize the target samples in a transductive learning manner, which may raise privacy concerns from source individuals. Several recent studies resort to an alternative solution by exploiting the well-trained white-box model from the source domain, yet, it may still leak the raw data through generative adversarial learning. This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). Taking into consideration the target data structure, DINE first distills the knowledge from the source predictor to a customized target model, then fine-tunes the distilled model to further fit the target domain. Besides, neural networks are not required to be identical across domains in DINE, even allowing effective adaptation on a low-resource device. Empirical results on three UDA scenarios (i.e., single-source, multi-source, and partial-set) confirm that DINE achieves highly competitive performance compared to state-of-the-art data-dependent approaches. Code is available at \url{https://github.com/tim-learn/DINE/}.

preprint2022arXiv

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

preprint2022arXiv

Finding Diverse and Predictable Subgraphs for Graph Domain Generalization

This paper focuses on out-of-distribution generalization on graphs where performance drops due to the unseen distribution shift. Previous graph domain generalization works always resort to learning an invariant predictor among different source domains. However, they assume sufficient source domains are available during training, posing huge challenges for realistic applications. By contrast, we propose a new graph domain generalization framework, dubbed as DPS, by constructing multiple populations from the source domains. Specifically, DPS aims to discover multiple \textbf{D}iverse and \textbf{P}redictable \textbf{S}ubgraphs with a set of generators, namely, subgraphs are different from each other but all the them share the same semantics with the input graph. These generated source domains are exploited to learn an \textit{equi-predictive} graph neural network (GNN) across domains, which is expected to generalize well to unseen target domains. Generally, DPS is model-agnostic that can be incorporated with various GNN backbones. Extensive experiments on both node-level and graph-level benchmarks shows that the proposed DPS achieves impressive performance for various graph domain generalization tasks.

preprint2022arXiv

Heterogeneous Face Recognition via Face Synthesis with Identity-Attribute Disentanglement

Heterogeneous Face Recognition (HFR) aims to match faces across different domains (e.g., visible to near-infrared images), which has been widely applied in authentication and forensics scenarios. However, HFR is a challenging problem because of the large cross-domain discrepancy, limited heterogeneous data pairs, and large variation of facial attributes. To address these challenges, we propose a new HFR method from the perspective of heterogeneous data augmentation, named Face Synthesis with Identity-Attribute Disentanglement (FSIAD). Firstly, the identity-attribute disentanglement (IAD) decouples face images into identity-related representations and identity-unrelated representations (called attributes), and then decreases the correlation between identities and attributes. Secondly, we devise a face synthesis module (FSM) to generate a large number of images with stochastic combinations of disentangled identities and attributes for enriching the attribute diversity of synthetic images. Both the original images and the synthetic ones are utilized to train the HFR network for tackling the challenges and improving the performance of HFR. Extensive experiments on five HFR databases validate that FSIAD obtains superior performance than previous HFR approaches. Particularly, FSIAD obtains 4.8% improvement over state of the art in terms of VR@FAR=0.01% on LAMP-HQ, the largest HFR database so far.

preprint2022arXiv

Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification

Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time, which is a realistic but challenging problem. In contrast to methods assuming an identical model for different domains, Mixture of Experts (MoE) exploits multiple domain-specific networks for leveraging complementary information between domains, obtaining impressive results. However, prior MoE-based DG ReID methods suffer from a large model size with the increase of the number of source domains, and most of them overlook the exploitation of domain-invariant characteristics. To handle the two issues above, this paper presents a new approach called Mimic Embedding via adapTive Aggregation (META) for DG person ReID. To avoid the large model size, experts in META do not adopt a branch network for each source domain but share all the parameters except for the batch normalization layers. Besides multiple experts, META leverages Instance Normalization (IN) and introduces it into a global branch to pursue invariant features across domains. Meanwhile, META considers the relevance of an unseen target sample and source domains via normalization statistics and develops an aggregation module to adaptively integrate multiple experts for mimicking unseen target domain. Benefiting from a proposed consistency loss and an episodic training algorithm, META is expected to mimic embedding for a truly unseen target domain. Extensive experiments verify that META surpasses state-of-the-art DG person ReID methods by a large margin. Our code is available at https://github.com/xbq1994/META.

preprint2022arXiv

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.

preprint2022arXiv

Proton momentum and angular momentum decompositions with overlap fermions

We present a calculation of the proton momentum and angular momentum decompositions using overlap fermions on a $2+1$-flavor RBC/UKQCD domain-wall lattice at 0.143 fm with a pion mass of 171 MeV which is close to the physical one. A complete determination of the momentum and angular momentum fractions carried by up, down, strange and glue inside the proton has been done with valence pion masses varying from 171 to 391 MeV. We have utilized fast Fourier transform on the stochastic-sandwich method for connected-insertion parts and the cluster-decomposition error reduction technique for disconnected-insertion parts has been used to reduce statistical errors. The full nonperturbative renormalization and mixing between the quark and glue operators are carried out. The final results are normalized with the momentum and angular momentum sum rules and reported at the physical valence pion mass at ${\overline{\rm {MS}}}\, (μ= 2\ {\rm{GeV}})$. The renormalized momentum fractions for the quarks and glue are $\langle x \rangle^q = 0.491(20)(23)$ and $\langle x \rangle^g = 0.509(20)(23)$, respectively, and the renormalized total angular momentum fractions for quarks and glue are $2 J^q = 0.539(22)(44)$ and $2 J^g = 0.461(22)(44)$, respectively. The quark spin fraction is $Σ= 0.405(25)(37)$ from our previous work and the quark orbital angular momentum fraction is deduced from $2 L^q = 2 J^q - Σ$ to be $0.134(22)(44)$.

preprint2022arXiv

ProxyMix: Proxy-based Mixup Training with Label Refinery for Source-Free Domain Adaptation

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Owing to privacy concerns and heavy data transmission, source-free UDA, exploiting the pre-trained source models instead of the raw source data for target learning, has been gaining popularity in recent years. Some works attempt to recover unseen source domains with generative models, however introducing additional network parameters. Other works propose to fine-tune the source model by pseudo labels, while noisy pseudo labels may misguide the decision boundary, leading to unsatisfied results. To tackle these issues, we propose an effective method named Proxy-based Mixup training with label refinery (ProxyMix). First of all, to avoid additional parameters and explore the information in the source model, ProxyMix defines the weights of the classifier as the class prototypes and then constructs a class-balanced proxy source domain by the nearest neighbors of the prototypes to bridge the unseen source domain and the target domain. To improve the reliability of pseudo labels, we further propose the frequency-weighted aggregation strategy to generate soft pseudo labels for unlabeled target data. The proposed strategy exploits the internal structure of target features, pulls target features to their semantic neighbors, and increases the weights of low-frequency classes samples during gradient updating. With the proxy domain and the reliable pseudo labels, we employ two kinds of mixup regularization, i.e., inter- and intra-domain mixup, in our framework, to align the proxy and the target domain, enforcing the consistency of predictions, thereby further mitigating the negative impacts of noisy labels. Experiments on three 2D image and one 3D point cloud object recognition benchmarks demonstrate that ProxyMix yields state-of-the-art performance for source-free UDA tasks.

preprint2020arXiv

A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation

This work addresses the unsupervised domain adaptation problem, especially in the case of class labels in the target domain being only a subset of those in the source domain. Such a partial transfer setting is realistic but challenging and existing methods always suffer from two key problems, negative transfer and uncertainty propagation. In this paper, we build on domain adversarial learning and propose a novel domain adaptation method BA$^3$US with two new techniques termed Balanced Adversarial Alignment (BAA) and Adaptive Uncertainty Suppression (AUS), respectively. On one hand, negative transfer results in misclassification of target samples to the classes only present in the source domain. To address this issue, BAA pursues the balance between label distributions across domains in a fairly simple manner. Specifically, it randomly leverages a few source samples to augment the smaller target domain during domain alignment so that classes in different domains are symmetric. On the other hand, a source sample would be denoted as uncertain if there is an incorrect class that has a relatively high prediction score, and such uncertainty easily propagates to unlabeled target data around it during alignment, which severely deteriorates adaptation performance. Thus we present AUS that emphasizes uncertain samples and exploits an adaptive weighted complement entropy objective to encourage incorrect classes to have uniform and low prediction scores. Experimental results on multiple benchmarks demonstrate our BA$^3$US surpasses state-of-the-arts for partial domain adaptation tasks. Code is available at \url{https://github.com/tim-learn/BA3US}.

preprint2020arXiv

Adversarial Infidelity Learning for Model Interpretation

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

preprint2020arXiv

General-Purpose User Embeddings based on Mobile App Usage

In this paper, we report our recent practice at Tencent for user modeling based on mobile app usage. User behaviors on mobile app usage, including retention, installation, and uninstallation, can be a good indicator for both long-term and short-term interests of users. For example, if a user installs Snapseed recently, she might have a growing interest in photographing. Such information is valuable for numerous downstream applications, including advertising, recommendations, etc. Traditionally, user modeling from mobile app usage heavily relies on handcrafted feature engineering, which requires onerous human work for different downstream applications, and could be sub-optimal without domain experts. However, automatic user modeling based on mobile app usage faces unique challenges, including (1) retention, installation, and uninstallation are heterogeneous but need to be modeled collectively, (2) user behaviors are distributed unevenly over time, and (3) many long-tailed apps suffer from serious sparsity. In this paper, we present a tailored AutoEncoder-coupled Transformer Network (AETN), by which we overcome these challenges and achieve the goals of reducing manual efforts and boosting performance. We have deployed the model at Tencent, and both online/offline experiments from multiple domains of downstream applications have demonstrated the effectiveness of the output user embeddings.

preprint2020arXiv

Hybrid Differentially Private Federated Learning on Vertically Partitioned Data

We present HDP-VFL, the first hybrid differentially private (DP) framework for vertical federated learning (VFL) to demonstrate that it is possible to jointly learn a generalized linear model (GLM) from vertically partitioned data with only a negligible cost, w.r.t. training time, accuracy, etc., comparing to idealized non-private VFL. Our work builds on the recent advances in VFL-based collaborative training among different organizations which rely on protocols like Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC) to secure computation and training. In particular, we analyze how VFL's intermediate result (IR) can leak private information of the training data during communication and design a DP-based privacy-preserving algorithm to ensure the data confidentiality of VFL participants. We mathematically prove that our algorithm not only provides utility guarantees for VFL, but also offers multi-level privacy, i.e. DP w.r.t. IR and joint differential privacy (JDP) w.r.t. model weights. Experimental results demonstrate that our work, under adequate privacy budgets, is quantitatively and qualitatively similar to GLMs, learned in idealized non-private VFL setting, rather than the increased cost in memory and processing time in most prior works based on HE or MPC. Our codes will be released if this paper is accepted.

preprint2020arXiv

PDFs and Neutrino-Nucleon Scattering from Hadronic Tensor

We review the Euclidean path-integral formulation of the nucleon hadronic tensor and classify the gauge invariant and topologically distinct insertions in terms of connected and disconnected insertions and also in terms of leading and higher-twist contributions in the DIS region. Converting the Euclidean hadronic tensor back to the Minkowski space requires solving an inverse problem of the Laplace transform. We have investigated several inverse algorithms and studied the pros and cons of each. We show a result with a relatively large momentum transfer ($Q^2 \sim 4\, {\rm GeV^2}$) to suppress the elastic scattering and reveal the contributions from the resonance and inelastic region of the neutrino-nucleon scattering. For elastic scattering, the hadronic tensor is the the product of the elastic form factors for the two corresponding currents. We checked numerically for the case of two charge vector currents ($V_4$) with the electric form factor calculated from the three-point function and found they agree within errors.

preprint2020arXiv

Ratio of strange to $u/d$ momentum fraction in disconnected insertions

The ratio of the strange quark momentum fraction $\langle x\rangle_{s+\bar{s}}$ to that of light quark $u$ or $d$ in disconnected insertions (DI) is calculated on the lattice with overlap fermions on four domain wall fermion ensembles. These ensembles cover three lattice spacings, three volumes and several pion masses including the physical one, from which a global fitting is carried out. A complete nonperturbative renormalization and the mixing between the quark and glue operators are taken into account. We find the ratio to be $\langle x\rangle_{s+\bar{s}}/\langle x\rangle_{u+\bar{u}} ({\rm DI})=0.795(79)(77)$ at $μ= 2$ GeV in the $\overline{\rm MS}$ scheme. This ratio can be used as a constraint to better determine the strange parton distribution especially in the small $x$ region in the global fittings of PDFs when the connected and disconnected sea are fitted and evolved separately, demonstrating a new way that connects lattice calculations with global analyses.

preprint2020arXiv

Relation-Guided Representation Learning

Deep auto-encoders (DAEs) have achieved great success in learning data representations via the powerful representability of neural networks. But most DAEs only focus on the most dominant structures which are able to reconstruct the data from a latent space and neglect rich latent structural information. In this work, we propose a new representation learning method that explicitly models and leverages sample relations, which in turn is used as supervision to guide the representation learning. Different from previous work, our framework well preserves the relations between samples. Since the prediction of pairwise relations themselves is a fundamental problem, our model adaptively learns them from data. This provides much flexibility to encode real data manifold. The important role of relation and representation learning is evaluated on the clustering task. Extensive experiments on benchmark data sets demonstrate the superiority of our approach. By seeking to embed samples into subspace, we further show that our method can address the large-scale and out-of-sample problem.

preprint2020arXiv

Roper State from Overlap Fermions

The Roper state is extracted with valence overlap fermions on a $2+1$-flavor domain-wall fermion lattice (spacing $a = 0.114$ fm and $m_π = 330$ MeV) using both the Sequential Empirical Bayes (SEB) method and the variational method. The results are consistent, provided that a large smearing-size interpolation operator is included in the variational calculation to have better overlap with the lowest radial excitation. Similar calculations carried out for an anisotropic clover lattice with similar parameters find the Roper $\approx 280$ MeV higher than that of the overlap fermion. The fact that the prediction of the Roper state by overlap fermions is consistently lower than those of clover fermions, chirally improved fermions, and twisted-mass fermions over a wide range of pion masses has been dubbed a "Roper puzzle." To understand the origin of this difference, we study the hairpin $Z$-diagram in the isovector scalar meson ($a_0$) correlator in the quenched approximation. Comparing the $a_0$ correlators for clover and overlap fermions, at a pion mass of 290 MeV, we find that the spectral weight of the ghost state with clover fermions is smaller than that of the overlap at $a = 0.12$ fm and $0.09$ fm, whereas the whole $a_0$ correlators of clover and overlap at $a = 0.06$ fm coincide within errors. This suggests that chiral symmetry is restored for clover at $a \le 0.06$ fm and that the Roper should come down at and below this $a$. We conclude that this work supports a resolution of the "Roper puzzle" due to $Z$-graph type chiral dynamics. This entails coupling to higher components in the Fock space (e.g. $Nπ$, $Nππ$ states) to induce the effective flavor-spin interaction between quarks as prescribed in the chiral quark model, resulting in the parity-reversal pattern as observed in the experimental excited states of $N, Δ$ and $Λ$.

preprint2020arXiv

The nucleon isovector tensor charge from lattice QCD using chiral fermions

In this work we present the isovector flavor combination for the nucleon tensor charge extracted from lattice QCD simulations using overlap fermions on $N_f=2+1$ domain-wall configurations. The pion mass dependence is studied using six valence quark masses, each reproducing a value for the pion mass in the valence sector between 147 and 330 MeV. We investigate and eliminate systematic uncertainties due to contamination by excited states, by employing several values for the source-sink separation that span from 1 fm to 1.6 fm. We apply a chiral extrapolation in the valence sector using a quadratic and a logarithmic term to fit the pion mass dependence, which describes well the lattice data. The lattice matrix element is renormalized non-perturbatively, and the final result is $g_T=1.096(30)$ in the $\overline{\rm MS}$ scheme at a renormalization scale of 2 GeV.

preprint2020arXiv

Trace anomaly and dynamical quark mass

We investigated the origin of the RI&#39;/MOM quark mass under the Landau gauge at the non-perturbative scale, using the chiral fermion with different quark masses and lattice spacings. Our result confirms that such a mass is non-vanishing based on the linear extrapolation to the chiral and continuum limit, and shows that such a mass comes from the spontaneous chiral symmetry breaking induced by the near zero modes with the eigenvalue $λ<{\cal O}(5m_q)$, and is proportional to the quark matrix element of the trace anomaly at least down to $\sim $1.3 GeV.

preprint2020arXiv

Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning

Transfer learning has become a common practice for training deep learning models with limited labeled data in a target domain. On the other hand, deep models are vulnerable to adversarial attacks. Though transfer learning has been widely applied, its effect on model robustness is unclear. To figure out this problem, we conduct extensive empirical evaluations to show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model. Empirical results show that the adversarial examples are more transferable when fine-tuning is used than they are when the two networks are trained independently.

preprint2019arXiv

Towards the nucleon hadronic tensor from lattice QCD

We present the first calculation of the hadronic tensor on the lattice for the nucleon. The hadronic tensor can be used to extract the structure functions in deep inelastic scatterings and also provide information for the neutrino-nucleon scattering which is crucial to the neutrino-nucleus scattering experiments at low energies. The most challenging part in the calculation is to solve an inverse problem. We have implemented and tested three algorithms using mock data, showing that the Bayesian Reconstruction method has the best resolution in extracting peak structures while the Backus-Gilbert and Maximum Entropy methods are somewhat more stable for the flat spectral function. Numerical results are presented for both the elastic case (clover fermions on domain wall configuration with $m_π\sim$ 370 MeV and $a\sim$ 0.06 fm) and a case (anisotropic clover lattice with $m_π\sim$ 380 MeV and $a_t\sim$ 0.035 fm) with large momentum transfer. For the former case, the reconstructed Minkowski hadronic tensor gives precisely the vector charge which proves the feasibility of the approach. While for the latter case, the nucleon resonances and possibly shallow inelastic scattering contributions around $ν=1$ GeV are clearly observed but no information is obtained for higher excited states with $ν>2$ GeV. A check of the effective masses of $ρ$ meson with different lattice setups indicates that, in order to reach higher energy transfers, using lattices with smaller lattice spacings is essential.