Source author record

Hao He

Hao He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Artificial Intelligence astro-ph.GA Computation and Language Information Retrieval Information Theory math.IT Methodology physics.optics Robotics Applications Computer Science and Game Theory cond-mat.mtrl-sci Cryptography and Security eess.IV math.OC Neural and Evolutionary Computing Software Engineering

Catalog footprint

What is connected

23works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

In-Context Learning (ICL) is an important paradigm for adapting Large Language Models (LLMs) to downstream tasks through a few demonstrations. Despite the great success of ICL, the limitation of the demonstration number may lead to demonstration bias, i.e. the input-label mapping induced by LLMs misunderstands the task's essence. Inspired by human experience, we attempt to mitigate such bias through the perspective of the inter-demonstration relationship. Specifically, we construct Comparable Demonstrations (CDs) by minimally editing the texts to flip the corresponding labels, in order to highlight the task's essence and eliminate potential spurious correlations through the inter-demonstration comparison. Through a series of experiments on CDs, we find that (1) demonstration bias does exist in LLMs, and CDs can significantly reduce such bias; (2) CDs exhibit good performance in ICL, especially in out-of-distribution scenarios. In summary, this study explores the ICL mechanisms from a novel perspective, providing a deeper insight into the demonstration selection strategy for ICL.

preprint2024arXiv

Randomized algorithms for precise measurement of differentially-private, personalized recommendations

Personalized recommendations form an important part of today's internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be privacy-first. In this article, we propose an algorithm for personalized recommendations that facilitates both precise and differentially-private measurement. We consider advertising as an example application, and conduct offline experiments to quantify how the proposed privacy-preserving algorithm affects key metrics related to user experience, advertiser value, and platform revenue compared to the extremes of both (private) non-personalized and non-private, personalized implementations.

preprint2024arXiv

Understanding LLMs: A Comprehensive Overview from Training to Inference

The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of large language model training techniques and inference deployment technologies aligned with this emerging trend. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs' utilization and provides insights into their future development.

preprint2023arXiv

Contrast with Major Classifier Vectors for Federated Medical Relation Extraction with Heterogeneous Label Distribution

Federated medical relation extraction enables multiple clients to train a deep network collaboratively without sharing their raw medical data. In order to handle the heterogeneous label distribution across clients, most of the existing works only involve enforcing regularization between local and global models during optimization. In this paper, we fully utilize the models of all clients and propose a novel concept of \textit{major classifier vectors}, where a group of class vectors is obtained in an ensemble rather than the weighted average method on the server. The major classifier vectors are then distributed to all clients and the local training of each client is Contrasted with Major Classifier vectors (FedCMC), so the local model is not prone to overfitting to the local label distribution. FedCMC requires only a small amount of additional transfer of classifier parameters without any leakage of raw data, extracted representations, and label distributions. Our extensive experiments show that FedCMC outperforms the other state-of-the-art FL algorithms on three medical relation extraction datasets.

preprint2022arXiv

Controlling Directions Orthogonal to a Classifier

We propose to identify directions invariant to a given classifier so that these directions can be controlled in tasks such as style transfer. While orthogonal decomposition is directly identifiable when the given classifier is linear, we formally define a notion of orthogonality in the non-linear case. We also provide a surprisingly simple method for constructing the orthogonal classifier (a classifier utilizing directions other than those of the given classifier). Empirically, we present three use cases where controlling orthogonal variation is important: style transfer, domain adaptation, and fairness. The orthogonal classifier enables desired style transfer when domains vary in multiple aspects, improves domain adaptation with label shifts and mitigates the unfairness as a predictor. The code is available at http://github.com/Newbeeer/orthogonal_classifier

preprint2022arXiv

Demystifying Software Release Note Issues on GitHub

Release notes (RNs) summarize main changes between two consecutive software versions and serve as a central source of information when users upgrade software. While producing high quality RNs can be hard and poses a variety of challenges to developers, a comprehensive empirical understanding of these challenges is still lacking. In this paper, we bridge this knowledge gap by manually analyzing 1,731 latest GitHub issues to build a comprehensive taxonomy of RN issues with four dimensions: Content, Presentation, Accessibility, and Production. Among these issues, nearly half (48.47%) of them focus on Production; Content, Accessibility, and Presentation take 25.61%, 17.65%, and 8.27%, respectively. We find that: 1) RN producers are more likely to miss information than to include incorrect information, especially for breaking changes; 2) improper layout may bury important information and confuse users; 3) many users find RNs inaccessible due to link deterioration, lack of notification, and obfuscate RN locations; 4) automating and regulating RN production remains challenging despite the great needs of RN producers. Our taxonomy not only pictures a roadmap to improve RN production in practice but also reveals interesting future research directions for automating RN production.

preprint2022arXiv

Domain Adaptation with Factorizable Joint Shift

Existing domain adaptation (DA) usually assumes the domain shift comes from either the covariates or the labels. However, in real-world applications, samples selected from different domains could have biases in both the covariates and the labels. In this paper, we propose a new assumption, Factorizable Joint Shift (FJS), to handle the co-existence of sampling bias in covariates and labels. Although allowing for the shift from both sides, FJS assumes the independence of the bias between the two factors. We provide theoretical and empirical understandings about when FJS degenerates to prior assumptions and when it is necessary. We further propose Joint Importance Aligning (JIA), a discriminative learning objective to obtain joint importance estimators for both supervised and unsupervised domain adaptation. Our method can be seamlessly incorporated with existing domain adaptation algorithms for better importance estimation and weighting on the training data. Experiments on a synthetic dataset demonstrate the advantage of our method.

preprint2022arXiv

Embedded Young Massive Star Clusters in the Antennae Merger

The properties of young massive clusters (YMCs) are key to understanding the star formation mechanism in starburst systems, especially mergers. We present ALMA high-resolution ($\sim$10 pc) continuum (100 and 345 GHz) data of YMCs in the overlap region of the Antennae galaxy. We identify 6 sources in the overlap region, including two sources that lie in the same giant molecular cloud (GMC). These YMCs correspond well with radio sources in lower resolution continuum (100 and 220 GHz) images at GMC scales ($\sim$60 pc). We find most of these YMCs are bound clusters through virial analysis. We estimate their ages to be $\sim$1 Myr and to be either embedded or just beginning to emerge from their parent cloud. We also compare each radio source with Pa$β$ source and find they have consistent total ionizing photon numbers, which indicates they are tracing the same physical source. By comparing the free-free emission at $\sim$10 pc scale and $\sim$60 pc scale, we find that $\sim$50% of the free-free emission in GMCs actually comes from these YMCs. This indicates that roughly half of the stars in massive GMCs are formed in bound clusters. We further explore the mass correlation between YMCs and GMCs in the Antennae and find it generally agrees with the predictions of the star cluster simulations. The most massive YMC has a stellar mass that is 1% - 5% of its host GMC mass.

preprint2022arXiv

FedDAR: Federated Domain-Aware Representation Learning

Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods.

preprint2022arXiv

Model and predict age and sex in healthy subjects using brain white matter features: A deep learning approach

The human brain's white matter (WM) structure is of immense interest to the scientific community. Diffusion MRI gives a powerful tool to describe the brain WM structure noninvasively. To potentially enable monitoring of age-related changes and investigation of sex-related brain structure differences on the mapping between the brain connectome and healthy subjects' age and sex, we extract fiber-cluster-based diffusion features and predict sex and age with a novel ensembled neural network classifier. We conduct experiments on the Human Connectome Project (HCP) young adult dataset and show that our model achieves 94.82% accuracy in sex prediction and 2.51 years MAE in age prediction. We also show that the fractional anisotropy (FA) is the most predictive of sex, while the number of fibers is the most predictive of age and the combination of different features can improve the model performance.

preprint2022arXiv

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods.

preprint2021arXiv

Non-Dispersive Space-Time Wave Packets Propagating in Dispersive Media

Space-time wave packets can propagate invariantly in free space with arbitrary group velocity thanks to the spatio-temporal correlation. Here it is proved that the space-time wave packets are stable in dispersive media as well and free from the spread in time caused by material dispersion. Furthermore, the law of anomalous refraction for space-time wave packets is generalized to the weakly dispersive situation. These results reveal new potential of space-time wave packets for the applications in real dispersive media.

preprint2020arXiv

Bid Shading by Win-Rate Estimation and Surplus Maximization

This paper describes a new win-rate based bid shading algorithm (WR) that does not rely on the minimum-bid-to-win feedback from a Sell-Side Platform (SSP). The method uses a modified logistic regression to predict the profit from each possible shaded bid price. The function form allows fast maximization at run-time, a key requirement for Real-Time Bidding (RTB) systems. We report production results from this method along with several other algorithms. We found that bid shading, in general, can deliver significant value to advertisers, reducing price per impression to about 55% of the unshaded cost. Further, the particular approach described in this paper captures 7% more profit for advertisers, than do benchmark methods of just bidding the most probable winning price. We also report 4.3% higher surplus than an industry Sell-Side Platform shading service. Furthermore, we observed 3% - 7% lower eCPM, eCPC and eCPA when the algorithm was integrated with budget controllers. We attribute the gains above as being mainly due to the explicit maximization of the surplus function, and note that other algorithms can take advantage of this same approach.

preprint2020arXiv

Continuously Indexed Domain Adaptation

Existing domain adaptation focuses on transferring knowledge between domains with categorical indices (e.g., between datasets A and B). However, many tasks involve continuously indexed domains. For example, in medical applications, one often needs to transfer disease analysis and prediction across patients of different ages, where age acts as a continuous domain index. Such tasks are challenging for prior domain adaptation methods since they ignore the underlying relation among domains. In this paper, we propose the first method for continuously indexed domain adaptation. Our approach combines traditional adversarial adaptation with a novel discriminator that models the encoding-conditioned domain index distribution. Our theoretical analysis demonstrates the value of leveraging the domain index to generate invariant features across a continuous range of domains. Our empirical results show that our approach outperforms the state-of-the-art domain adaption methods on both synthetic and real-world medical datasets.

preprint2020arXiv

Is this an Early Stage Merger? A Case Study on Molecular Gas and Star Formation Properties of Arp 240

We present new high resolution $^{12}$CO $J$=1-0, $J$=2-1, and $^{13}$CO $J$=1-0 maps of the early stage merger Arp 240 (NGC5257/8) obtained with the Atacama Large Millimeter/submillimeter Array (ALMA). Simulations in the literature suggest that the merger has just completed its first passage; however, we find that this system has a lower global gas fraction but a higher star formation efficiency compared to typical close galaxy pairs, which suggests that this system may already be in an advanced merger stage. We combine the ALMA data with $^{12}$CO $J$=3-2 observations from the Submillimeter Array and carry out RADEX modeling on several different regions. Both the RADEX modeling and a local thermal equilibrium (LTE) analysis show that the regions are most likely to have a CO-to-H$_2$ conversion factor $α_{\mathrm{CO}}$ close to or perhaps even smaller than the typical value for (ultra-)luminous infrared galaxies. Using 33 GHz data from the Very Large Array to measure the star formation rate, we find that most star forming regions have molecular gas depletion times of less than 100 Myr. We calculated the star formation efficiency (SFE) per free-fall time for different regions and find some regions appear to have values greater than 100%. We find these regions generally show evidence for young massive clusters (YMCs). After exploring various factors, we argue that this is mainly due to the fact that radio continuum emission in those regions is dominated by that from YMCs, which results in an overestimate of the SFE per free-fall time.

preprint2020arXiv

Learning Compositional Koopman Operators for Model-Based Control

Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis. The Koopman operator theory lays the foundation for identifying the nonlinear-to-linear coordinate transformations with data-driven methods. Recently, researchers have proposed to use deep neural networks as a more expressive class of basis functions for calculating the Koopman operators. These approaches, however, assume a fixed dimensional state space; they are therefore not applicable to scenarios with a variable number of objects. In this paper, we propose to learn compositional Koopman operators, using graph neural networks to encode the state into object-centric embeddings and using a block-wise linear transition matrix to regularize the shared structure across objects. The learned dynamics can quickly adapt to new environments of unknown physical parameters and produce control signals to achieve a specified goal. Our experiments on manipulating ropes and controlling soft robots show that the proposed method has better efficiency and generalization ability than existing baselines.

preprint2020arXiv

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the likelihood ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Truly PPO. Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the difference between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, such that optimizing the resulted surrogate objective function provides guaranteed monotonic improvement of the ultimate policy performance. It seems, by adhering more truly to making the algorithm proximal - confining the policy within the trust region, the new algorithm improves the original PPO on both sample efficiency and performance.

preprint2020arXiv

UST: Unifying Spatio-Temporal Context for Trajectory Prediction in Autonomous Driving

Trajectory prediction has always been a challenging problem for autonomous driving, since it needs to infer the latent intention from the behaviors and interactions from traffic participants. This problem is intrinsically hard, because each participant may behave differently under different environments and interactions. This key is to effectively model the interlaced influence from both spatial context and temporal context. Existing work usually encodes these two types of context separately, which would lead to inferior modeling of the scenarios. In this paper, we first propose a unified approach to treat time and space dimensions equally for modeling spatio-temporal context. The proposed module is simple and easy to implement within several lines of codes. In contrast to existing methods which heavily rely on recurrent neural network for temporal context and hand-crafted structure for spatial context, our method could automatically partition the spatio-temporal space to adapt the data. Lastly, we test our proposed framework on two recently proposed trajectory prediction dataset ApolloScape and Argoverse. We show that the proposed method substantially outperforms the previous state-of-the-art methods while maintaining its simplicity. These encouraging results further validate the superiority of our approach.

preprint2019arXiv

The CO(3-2)/CO(1-0) luminosity line ratio in nearby star-forming galaxies and AGN from xCOLD GASS, BASS and SLUGS

We study the r31=L'CO(3-2)/L'CO(1-0) luminosity line ratio in a sample of nearby (z < 0.05) galaxies: 25 star-forming galaxies (SFGs) from the xCOLD GASS survey, 36 hard X-ray selected AGN host galaxies from BASS and 37 infrared luminous galaxies from SLUGS. We find a trend for r31 to increase with star-formation efficiency (SFE). We model r31 using the UCL-PDR code and find that the gas density is the main parameter responsible for variation of r31, while the interstellar radiation field and cosmic ray ionization rate play only a minor role. We interpret these results to indicate a relation between SFE and gas density. We do not find a difference in the r31 value of SFGs and AGN host galaxies, when the galaxies are matched in SSFR (<r31>= 0.52 +/- 0.04 for SFGs and <r31> = 0.53 +/- 0.06 for AGN hosts). According to the results of UCL-PDR models, the X-rays can contribute to the enhancement of the CO line ratio, but only for strong X-ray fluxes and for high gas density (nH > 10$^4$ cm-3). We find a mild tightening of the Kennicutt-Schmidt relation when we use the molecular gas mass surface density traced by CO(3-2) (Pearson correlation coefficient R=0.83), instead of the molecular gas mass surface density traced by CO(1-0) (R=0.78), but the increase in correlation is not statistically significant (p-value=0.06). This suggests that the CO(3-2) line can be reliably used to study the relation between SFR and molecular gas for normal SFGs at high redshift, and to compare it with studies of low-redshift galaxies, as is common practice.

preprint2016arXiv

Prior Distributions for Ranking Problems

The ranking problem is to order a collection of units by some unobserved parameter, based on observations from the associated distribution. This problem arises naturally in a number of contexts, such as business, where we may want to rank potential projects by profitability; or science, where we may want to rank variables potentially associated with some trait by the strength of the association. Most approaches to this problem are empirical Bayesian, where we use the data to estimate the hyperparameters of the prior distribution, then use that distribution to estimate the unobserved parameter values. There are a number of different approaches to this problem, based on different loss functions for mis-ranking units. However, little has been done on the choice of prior distribution. Typical approaches involve choosing a conjugate prior for convenience, and estimating the hyperparameters by MLE from the whole dataset. In this paper, we look in more detail at the effect of choice of prior distribution on Bayesian ranking. We focus on the use of posterior mean for ranking, but many of our conclusions should apply to other ranking criteria, and it is not too difficult to adapt our methods to other choices of prior distributions.

preprint2015arXiv

A Coalitional Game for Distributed Inference in Sensor Networks with Dependent Observations

We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.

preprint2014arXiv

On Quantizer Design for Distributed Bayesian Estimation in Sensor Networks

We consider the problem of distributed estimation under the Bayesian criterion and explore the design of optimal quantizers in such a system. We show that, for a conditionally unbiased and efficient estimator at the fusion center and when local observations have identical distributions, it is optimal to partition the local sensors into groups, with all sensors within a group using the same quantization rule. When all the sensors use identical number of decision regions, use of identical quantizers at the sensors is optimal. When the network is constrained by the capacity of the wireless multiple access channel over which the sensors transmit their quantized observations, we show that binary quantizers at the local sensors are optimal under certain conditions. Based on these observations, we address the location parameter estimation problem and present our optimal quantizer design approach. We also derive the performance limit for distributed location parameter estimation under the Bayesian criterion and find the conditions when the widely used threshold quantizer achieves this limit. We corroborate this result using simulations. We then relax the assumption of conditionally independent observations and derive the optimality conditions of quantizers for conditionally dependent observations. Using counter-examples, we also show that the previous results do not hold in this setting of dependent observations and, therefore, identical quantizers are not optimal.

preprint2012arXiv

Observation of Multiphoton-induced Fluorescence from Nano Graphene Oxide and Its Applications in In vitro and In vivo Bioimaging

In the present paper, we observed both two-photon and three-photon induced distinct photoluminescence from GO nanoparticles under fs laser excitation. Conjugated with PEG molecules, GO nanoparticles exhibited high chemical stability, and could effectively label HeLa cells. Imaged with a two-photon scanning microscope, GO nanoparticles were observed to localize in the mitochondria, endoplasmic reticulum, Golgi and lysosome of HeLa cells. Furthermore, GO nanoparticles were micro-injected into the brain of a black mouse, and in vivo two-photon luminescence imaging illustrated that GO nanoparticles located at 300 μm depth in the brain could be clearly distinguished.

Hao He

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Randomized algorithms for precise measurement of differentially-private, personalized recommendations

Understanding LLMs: A Comprehensive Overview from Training to Inference

Contrast with Major Classifier Vectors for Federated Medical Relation Extraction with Heterogeneous Label Distribution

Controlling Directions Orthogonal to a Classifier

Demystifying Software Release Note Issues on GitHub

Domain Adaptation with Factorizable Joint Shift

Embedded Young Massive Star Clusters in the Antennae Merger

FedDAR: Federated Domain-Aware Representation Learning

Model and predict age and sex in healthy subjects using brain white matter features: A deep learning approach

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

Non-Dispersive Space-Time Wave Packets Propagating in Dispersive Media

Bid Shading by Win-Rate Estimation and Surplus Maximization

Continuously Indexed Domain Adaptation

Is this an Early Stage Merger? A Case Study on Molecular Gas and Star Formation Properties of Arp 240

Learning Compositional Koopman Operators for Model-Based Control

Truly Proximal Policy Optimization

UST: Unifying Spatio-Temporal Context for Trajectory Prediction in Autonomous Driving

The CO(3-2)/CO(1-0) luminosity line ratio in nearby star-forming galaxies and AGN from xCOLD GASS, BASS and SLUGS

Prior Distributions for Ranking Problems

A Coalitional Game for Distributed Inference in Sensor Networks with Dependent Observations

On Quantizer Design for Distributed Bayesian Estimation in Sensor Networks

Observation of Multiphoton-induced Fluorescence from Nano Graphene Oxide and Its Applications in In vitro and In vivo Bioimaging