Researcher profile

Nan Lu

Nan Lu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients

Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.

preprint2022arXiv

Improving Di-Higgs Sensitivity at Future Colliders in Hadronic Final States with Machine Learning

One of the central goals of the physics program at the future colliders is to elucidate the origin of electroweak symmetry breaking, including precision measurements of the Higgs sector. This includes a detailed study of Higgs boson (H) pair production, which can reveal the H self-coupling. Since the discovery of the Higgs boson, a large campaign of measurements of the properties of the Higgs boson has begun and many new ideas have emerged during the completion of this program. One such idea is the use of highly boosted and merged hadronic decays of the Higgs boson ($\mathrm{H}\to\mathrm{b}\bar{\mathrm{b}}$, $\mathrm{H}\to\mathrm{W}\mathrm{W}\to\mathrm{q}\bar{\mathrm{q}}\mathrm{q}\bar{\mathrm{q}}$) with machine learning methods to improve the signal-to-background discrimination. In this white paper, we champion the use of these modes to boost the sensitivity of future collider physics programs to Higgs boson pair production, the Higgs self-coupling, and Higgs-vector boson couplings. We demonstrate the potential improvement possible at the Future Circular Collider in hadron mode, especially with the use of graph neural networks.

preprint2022arXiv

Pointwise Binary Classification with Pairwise Confidence Comparisons

To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise labels are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a pointwise label, it cannot be easily collected if either point is equally likely to be positive or negative. Thus, in this paper, we propose a novel setting called pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. Firstly, we give a Pcomp data generation process, derive an unbiased risk estimator (URE) with theoretical guarantee, and further improve URE using correction functions. Secondly, we link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization. Finally, we demonstrate by experiments the effectiveness of our methods, which suggests Pcomp is a valuable and practically useful type of pairwise supervision besides the pairwise label.

preprint2020arXiv

Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach

The recently proposed unlabeled-unlabeled (UU) classification method allows us to train a binary classifier only from two unlabeled datasets with different class priors. Since this method is based on the empirical risk minimization, it works as if it is a supervised classification method, compatible with any model and optimizer. However, this method sometimes suffers from severe overfitting, which we would like to prevent in this paper. Our empirical finding in applying the original UU method is that overfitting often co-occurs with the empirical risk going negative, which is not legitimate. Therefore, we propose to wrap the terms that cause a negative empirical risk by certain correction functions. Then, we prove the consistency of the corrected risk estimator and derive an estimation error bound for the corrected risk minimizer. Experiments show that our proposal can successfully mitigate overfitting of the UU method and significantly improve the classification accuracy.