Researcher profile

Hui Gao

Hui Gao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Applying Embedding-Based Retrieval to Airbnb Search

The goal of Airbnb search is to match guests with the ideal accommodation that fits their travel needs. This is a challenging problem, as popular search locations can have around a hundred thousand available homes, and guests themselves have a wide variety of preferences. Furthermore, the launch of new product features, such as \textit{flexible date search,} significantly increased the number of eligible homes per search query. As such, there is a need for a sophisticated retrieval system which can provide high-quality candidates with low latency in a way that integrates with the overall ranking stack. This paper details our journey to build an efficient and high-quality retrieval system for Airbnb search. We describe the key unique challenges we encountered when implementing an Embedding-Based Retrieval (EBR) system for a two sided marketplace like Airbnb -- such as the dynamic nature of the inventory, a lengthy user funnel with multiple stages, and a variety of product surfaces. We cover unique insights when modeling the retrieval problem, how to build robust evaluation systems, and design choices for online serving. The EBR system was launched to production and powers several use-cases such as regular search, flexible date and promotional emails for marketing campaigns. The system demonstrated statistically-significant improvements in key metrics, such as booking conversion, via A/B testing.

preprint2025arXiv

Large Emotional World Model

World Models serve as tools for understanding the current state of the world and predicting its future dynamics, with broad application potential across numerous fields. As a key component of world knowledge, emotion significantly influences human decision-making. While existing Large Language Models (LLMs) have shown preliminary capability in capturing world knowledge, they primarily focus on modeling physical-world regularities and lack systematic exploration of emotional factors. In this paper, we first demonstrate the importance of emotion in understanding the world by showing that removing emotionally relevant information degrades reasoning performance. Inspired by theory of mind, we further propose a Large Emotional World Model (LEWM). Specifically, we construct the Emotion-Why-How (EWH) dataset, which integrates emotion into causal relationships and enables reasoning about why actions occur and how emotions drive future world states. Based on this dataset, LEWM explicitly models emotional states alongside visual observations and actions, allowing the world model to predict both future states and emotional transitions. Experimental results show that LEWM more accurately predicts emotion-driven social behaviors while maintaining comparable performance to general world models on basic tasks.

preprint2022arXiv

Digraph analogues for the Nine Dragon Tree Conjecture

The fractional arboricity of a digraph $D$, denoted by $γ(D)$, is defined as $γ(D)= \max_{H \subseteq D, |V(H)| >1} \frac {|A(H)|} {|V(H)|-1}$. Frank in [Covering branching, Acta Scientiarum Mathematicarum (Szeged) 41 (1979), 77-81] proved that a digraph $D$ decomposes into $k$ branchings, if and only if $Δ^{-}(D) \leq k$ and $γ(D) \leq k$. In this paper, we study digraph analogues for the Nine Dragon Tree Conjecture. We conjecture that, for positive integers $k$ and $d$, if $D$ is a digraph with $γ(D) \leq k + \frac{d-k}{d+1}$ and $Δ^{-}(D) \leq k+1$, then $D$ decomposes into $k + 1$ branchings $B_{1}, \ldots, B_{k}, B_{k+1}$ with $Δ^{+}(B_{k+1}) \leq d$. This conjecture, if true, is a refinement of Frank's characterization. A series of acyclic bipartite digraphs is also presented to show the bound of $γ(D)$ given in the conjecture is best possible. We prove our conjecture for the cases $d \leq k$. As more evidence to support our conjecture, we prove that if $D$ is a digraph with the maximum average degree $mad(D)$ $\leq$ $2k + \frac{2(d-k)}{d+1}$ and $Δ^{-}(D) \leq k+1$, then $D$ decomposes into $k + 1$ pseudo-branchings $C_{1}, \ldots, C_{k}, C_{k+1}$ with $Δ^{+}(C_{k+1}) \leq d$.

preprint2022arXiv

Multi-Head Online Learning for Delayed Feedback Modeling

In online advertising, it is highly important to predict the probability and the value of a conversion (e.g., a purchase). It not only impacts user experience by showing relevant ads, but also affects ROI of advertisers and revenue of marketplaces. Unlike clicks, which often occur within minutes after impressions, conversions are expected to happen over a long period of time (e.g., 30 days for online shopping). It creates a challenge, as the true labels are only available after the long delays. Either inaccurate labels (partial conversions) are used, or models are trained on stale data (e.g., from 30 days ago). The problem is more eminent in online learning, which focuses on the live performance on the latest data. In this paper, a novel solution is presented to address this challenge using multi-head modeling. Unlike traditional methods, it directly quantizes conversions into multiple windows, such as day 1, day 2, day 3-7, and day 8-30. A sub-model is trained specifically on conversions within each window. Label freshness is maximally preserved in early models (e.g., day 1 and day 2), while late conversions are accurately utilized in models with longer delays (e.g., day 8-30). It is shown to greatly exceed the performance of known methods in online learning experiments for both conversion rate (CVR) and value per click (VPC) predictions. Lastly, as a general method for delayed feedback modeling, it can be combined with any advanced ML techniques to further improve the performance.

preprint2020arXiv

Integral $p$-adic Hodge theory in the imperfect residue field case

Let $K$ be a mixed characteristic complete discrete valuation field with residue field admitting a finite $p$-basis, and let $G_K$ be the Galois group. We first classify semi-stable representations of $G_K$ by weakly admissible filtered $(φ,N)$-modules with connections. We then construct a fully faithful functor from the category of \emph{integral} semi-stable representations of $G_K$ to the category of Breuil-Kisin $G_K$-modules. Using the integral theory, we classify $p$-divisible groups over the ring of integers of $K$ by minuscule Breuil-Kisin modules with connections.

preprint2020arXiv

Machine Learning Empowered Beam Management for Intelligent Reflecting Surface Assisted MmWave Networks

Recently, intelligent reflecting surface (IRS) assisted mmWave networks are emerging, which bear the potential to address the blockage issue of the millimeter wave (mmWave) communication in a more cost-effective way. In particular, IRS is built by passive and programmable electromagnetic elements that can manipulate the mmWave propagation channel into a more favorable condition that is free of blockage via judicious joint BS-IRS transmission design. However, the coexistence of IRSs and mmWave BSs complicates the network architecture, and thus poses great challenges for efficient beam management (BM) that is one critical prerequisite for high performance mmWave networks. In this paper, we systematically evaluate the key issues and challenges of BM for IRS-assisted mmWave networks to bring insights into the future network design. Specifically, we carefully classify and discuss the extensibility and limitations of the existing BM of conventional mmWave towards the IRS-assisted new paradigm. Moreover, we propose a novel machine learning empowered BM framework for IRS-assisted networks with representative showcases, which processes environmental and mobility awareness to achieve highly efficient BM with significantly reduced system overhead. Finally, some interesting future directions are also suggested to inspire further researches.

preprint2020arXiv

OTFS Based Receiver Scheme With Multi-Antennas in High-Mobility V2X Systems

Vehicle-to-everything (V2X) is considered as one of the most important applications of future wireless communication networks. However, the Doppler effect caused by the vehicle mobility may seriously deteriorate the performance of the vehicular communication links, especially when the channels exhibit a large number of Doppler frequency offsets (DFOs). Orthogonal time frequency space (OTFS) is a new waveform designed in the delay-Doppler domain, and can effectively convert a doubly dispersive channel into an almost non-fading channel, which makes it very attractive for V2X communications. In this paper, we design a novel OTFS based receiver with multi-antennas to deal with the high-mobility challenges in V2X systems. We show that the multiple DFOs associated with multipaths can be separated with the high-spatial resolution provided by multi-antennas, which leads to an enhanced sparsity of the OTFS channel in the delay-Doppler domain and bears a potential to reduce the complexity of the message passing (MP) detection algorithm. Based on this observation, we further propose a joint MP-maximum ration combining (MRC) iterative detection for OTFS, where the integration of MRC significantly improves the convergence performance of the iteration and gains an excellent system error performance. Finally, we provide numerical simulation results to corroborate the superiorities of the proposed scheme.

preprint2020arXiv

Packing branchings under cardinality constraints on their root sets

Edmonds' fundamental theorem on arborescences characterizes the existence of $k$ pairwise arc-disjoint spanning arborescences with prescribed root sets in a digraph. In this paper, we study the problem of packing branchings in digraphs under cardinality constraints on their root sets by arborescence augmentation. Let $D=(V+x,A)$ be a digraph, $\mathcal{P}=$ $\{I_{1}, \ldots, I_{l} \}$ be a partition of $[k]$, $c_{1}, \ldots, c_{l}, c'_{1}, \ldots, c'_{l}$ be nonnegative integers such that $c_α \leq c'_α$ for $α\in [l]$, $F_{1}, \ldots, F_{k}$ be $k$ arc-disjoint $x$-arborescences in $D$ such that $\sum_{i \in I_α}d_{F_{i}}^{+}(x)$ $\leq c'_α$ for $α\in [l]$. We give a characterization on when $F_{1}, \ldots, F_{k}$ can be completed to arc-disjoint spanning $x$-arborescences $F^{*}_{1}, \ldots, F^{*}_{k}$ such that for any $α\in [l]$, $ c_α \leq \sum_{i \in I_α}d^{+}_{F^{*}_{i}}(x)$ $ \leq c'_α$.

preprint2020arXiv

Packing of maximal independent mixed arborescences

Király in [On maximal independent arborescence packing, SIAM J. Discrete. Math. 30 (4) (2016), 2107-2114] solved the following packing problem: Given a digraph $D = (V, A)$, a matroid $M$ on a set $S = \{s_{1}, \ldots,s_{k} \}$ along with a map $π: S \rightarrow V$, find $k$ arc-disjoint maximal arborescences $T_{1}, \ldots ,T_{k}$ with roots $π(s_{1}), \ldots ,π(s_{k})$, such that, for any $v \in V$, the set $\{s_{i} : v \in V(T_{i})\}$ is independent and its rank reaches the theoretical maximum. In this paper, we give a new characterization for packing of maximal independent mixed arborescences under matroid constraints. This new characterization is simplified to the form of finding a supermodular function that should be covered by an orientation of each strong component of a matroid-based rooted mixed graph. Our proofs come along with a polynomial-time algorithm. Note that our new characterization extends Király's result to mixed graphs, this answers a question that has already attracted some attentions.

preprint2020arXiv

Packing of spanning mixed arborescences

In this paper, we characterize a mixed graph $F$ which contains $k$ edge and arc disjoint spanning mixed arborescences $F_{1}, \ldots, F_{k}$, such that for each $v \in V(F)$, the cardinality of $\{i \in [k]: v \text{ is the root of } F_{i}\}$ lies in some prescribed interval. This generalizes both Nash-Williams and Tutte's theorem on spanning tree packing for undirected graphs and the previous characterization on digraphs which was given by Cai [in: Arc-disjoint arborescences of digraphs, J. Graph Theory 7(2) (1983), 235-240] and Frank [in: On disjoint trees and arborescences, Algebraic Methods in Graph Theory, Colloquia Mathematica Soc. J. Bolyai, Vol. 25 (North-Holland, Amsterdam) (1978), 159-169].