Researcher profile

Le He

Le He contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Generative Actor-Critic with Soft Bridge Policies

Expressive generative policies such as diffusion and flow models are appealing for MaxEnt online reinforcement learning because of their ability to model multimodal and highly non-Gaussian action distributions. However, training effective soft generative policies faces two obstacles that often arise together. First, marginal action densities are often unavailable, so existing methods typically rely on entropy bounds, heuristic proxies or approximations. Second, iterative shared-parameter samplers raise inference cost and require backpropagation through time over repeated network evaluations, increasing memory cost and destabilizing policy optimization. These obstacles motivate us to seek a generative policy that exposes a tractable MaxEnt objective while requiring only a single sampled actor forward pass for action generation. To this end, we propose soft generative actor-critic (SoftGAC), whose actor defines a stochastic bridge from a fixed base latent to a terminal action latent in pre-tanh space. This structured bridge allows us to lift the MaxEnt objective as an analytically tractable path-wise relative-entropy objective against a high-entropy reference process. In practical finite-step implementation, this relative entropy reduces exactly to sampled transition control energy and thus provides principled soft regularization. Moreover, we keep the single-pass actor lightweight by using small step-specific bridge transitions, each evaluated only once per sampled action, while maintaining a parameter budget comparable to strong actor baselines. Extensive experiments on challenging continuous-control benchmarks show that SoftGAC attains higher or competitive returns than strong generative policy baselines, including diffusion and flow-matching policies, while staying in the low-latency regime of one-pass actors and showing considerable improvements in the compute-return tradeoff.

preprint2022arXiv

Towards Optimally Efficient Search with Deep Learning for Large-Scale MIMO Systems

This paper investigates the optimal signal detection problem with a particular interest in large-scale multiple-input multiple-output (MIMO) systems. The problem is NP-hard and can be solved optimally by searching the shortest path on the decision tree. Unfortunately, the existing optimal search algorithms often involve prohibitively high complexities, which indicates that they are infeasible in large-scale MIMO systems. To address this issue, we propose a general heuristic search algorithm, namely, hyper-accelerated tree search (HATS) algorithm. The proposed algorithm employs a deep neural network (DNN) to estimate the optimal heuristic, and then use the estimated heuristic to speed up the underlying memory-bounded search algorithm. This idea is inspired by the fact that the underlying heuristic search algorithm reaches the optimal efficiency with the optimal heuristic function. Simulation results show that the proposed algorithm reaches almost the optimal bit error rate (BER) performance in large-scale systems, while the memory size can be bounded. In the meanwhile, it visits nearly the fewest tree nodes. This indicates that the proposed algorithm reaches almost the optimal efficiency in practical scenarios, and thereby it is applicable for large-scale systems. Besides, the code for this paper is available at \url{https://github.com/skypitcher/hats}.

preprint2021arXiv

Learning based signal detection for MIMO systems with unknown noise statistics

This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics in multiple-input multiple-output (MIMO) systems. In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable. Existing detection methods have mainly focused on specific noise models, which are not robust enough with unknown noise statistics. To tackle this issue, we propose a novel ML detection framework to effectively recover the desired signal. Our framework is a fully probabilistic one that can efficiently approximate the unknown noise distribution through a normalizing flow. Importantly, this framework is driven by an unsupervised learning approach, where only the noise samples are required. To reduce the computational complexity, we further present a low-complexity version of the framework, by utilizing an initial estimation to reduce the search space. Simulation results show that our framework outperforms other existing algorithms in terms of bit error rate (BER) in non-analytical noise environments, while it can reach the ML performance bound in analytical noise environments. The code of this paper is available at https://github.com/skypitcher/manfe.