Researcher profile

Shao-Lun Huang

Shao-Lun Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

CAVE: A Structured Credit Assignment Approach for Fragmented Visual Evidence Reasoning

Vision-Language Models (VLMs) have achieved strong performance on general multimodal reasoning, yet remain challenged in integrating nonlocal visual information to support semantically underdetermined visual reasoning. We describe this challenge as Fragmented Visual Reasoning. To this end, we propose Credit Assignment for Visual Evidence (CAVE), a structured process-reward method based on GRPO for interleaved visual reasoning. Specifically, CAVE evaluates the contribution of intermediate steps at the action level via three complementary reasoning process signals: belief update, evidence acquisition, and adaptive focus control, thereby guiding the model to optimize each reasoning action and learn more reliable visual reasoning strategies. Meanwhile, we construct TRACER-Bench, which covers four nonlocal and semantically confusable reasoning dimensions and provides key intermediate evidence to supervise reasoning paths. Experiments demonstrate that CAVE substantially improves performance on tasks requiring fragmented visual evidence integration, covering both public benchmarks and our newly introduced TRACER-Bench, while retaining competitive performance on general multimodal evaluations. Further analyses reveal that CAVE effectively improves the visual reasoning capacity and exhibits stronger robustness under longer-range and deeper cross-region dependencies.

preprint2023arXiv

Exploring Iterative Refinement with Diffusion Models for Video Grounding

Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks.

preprint2022arXiv

An Information-theoretic Method for Collaborative Distributed Learning with Limited Communication

In this paper, we study the information transmission problem under the distributed learning framework, where each worker node is merely permitted to transmit a $m$-dimensional statistic to improve learning results of the target node. Specifically, we evaluate the corresponding expected population risk (EPR) under the regime of large sample sizes. We prove that the performance can be enhanced since the transmitted statistics contribute to estimating the underlying distribution under the mean square error measured by the EPR norm matrix. Accordingly, the transmitted statistics correspond to the eigenvectors of this matrix, and the desired transmission allocates these eigenvectors among the statistics such that the EPR is minimal. Moreover, we provide the analytical solution of the desired statistics for single-node and two-node transmission, where a geometrical interpretation is given to explain the eigenvector selection. For the general case, an efficient algorithm that can output the allocation solution is developed based on the node partitions.

preprint2022arXiv

Finding Influential Instances for Distantly Supervised Relation Extraction

Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from $\mathcal{O}(mn)$ to $\mathcal{O}(1)$, with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines that have complicated architectures. We also demonstrate that REIF can support interpretable instance selection.

preprint2022arXiv

On Distributed Learning with Constant Communication Bits

In this paper, we study a distributed learning problem constrained by constant communication bits. Specifically, we consider the distributed hypothesis testing (DHT) problem where two distributed nodes are constrained to transmit a constant number of bits to a central decoder. In such cases, we show that in order to achieve the optimal error exponents, it suffices to consider the empirical distributions of observed data sequences and encode them to the transmission bits. With such a coding strategy, we develop a geometric approach in the distribution spaces and establish an inner bound of error exponent regions. In particular, we show the optimal achievable error exponents and coding schemes for the following cases: (i) both nodes can transmit $\log_23$ bits; (ii) one of the nodes can transmit $1$ bit, and the other node is not constrained; (iii) the joint distribution of the nodes are conditionally independent given one hypothesis. Furthermore, we provide several numerical examples for illustrating the theoretical results. Our results provide theoretical guidance for designing practical distributed learning rules, and the developed approach also reveals new potentials for establishing error exponents for DHT with more general communication constraints.

preprint2022arXiv

PAC-Bayes Information Bottleneck

Understanding the source of the superior generalization ability of NNs remains one of the most important problems in ML research. There have been a series of theoretical works trying to derive non-vacuous bounds for NNs. Recently, the compression of information stored in weights (IIW) is proved to play a key role in NNs generalization based on the PAC-Bayes theorem. However, no solution of IIW has ever been provided, which builds a barrier for further investigation of the IIW's property and its potential in practical deep learning. In this paper, we propose an algorithm for the efficient approximation of IIW. Then, we build an IIW-based information bottleneck on the trade-off between accuracy and information complexity of NNs, namely PIB. From PIB, we can empirically identify the fitting to compressing phase transition during NNs' training and the concrete connection between the IIW compression and the generalization. Besides, we verify that IIW is able to explain NNs in broad cases, e.g., varying batch sizes, over-parameterization, and noisy labels. Moreover, we propose an MCMC-based algorithm to sample from the optimal weight posterior characterized by PIB, which fulfills the potential of IIW in enhancing NNs in practice.

preprint2022arXiv

Predicting Events in MOBA Games: Prediction, Attribution, and Evaluation

The multiplayer online battle arena (MOBA) games have become increasingly popular in recent years. Consequently, many efforts have been devoted to providing pre-game or in-game predictions for them. However, these works are limited in the following two aspects: 1) the lack of sufficient in-game features; 2) the absence of interpretability in the prediction results. These two limitations greatly restrict the practical performance and industrial application of the current works. In this work, we collect and release a large-scale dataset containing rich in-game features for the popular MOBA game Honor of Kings. We then propose to predict four types of important events in an interpretable way by attributing the predictions to the input features using two gradient-based attribution methods: Integrated Gradients and SmoothGrad. To evaluate the explanatory power of different models and attribution methods, a fidelity-based evaluation metric is further proposed. Finally, we evaluate the accuracy and Fidelity of several competitive methods on the collected dataset to assess how well machines predict events in MOBA games.

preprint2021arXiv

Lifelong Learning based Disease Diagnosis on Clinical Notes

Current deep learning based disease diagnosis systems usually fall short in catastrophic forgetting, i.e., directly fine-tuning the disease diagnosis model on new tasks usually leads to abrupt decay of performance on previous tasks. What is worse, the trained diagnosis system would be fixed once deployed but collecting training data that covers enough diseases is infeasible, which inspires us to develop a lifelong learning diagnosis system. In this work, we propose to adopt attention to combine medical entities and context, embedding episodic memory and consolidation to retain knowledge, such that the learned model is capable of adapting to sequential disease-diagnosis tasks. Moreover, we establish a new benchmark, named Jarvis-40, which contains clinical notes collected from various hospitals. Our experiments show that the proposed method can achieve state-of-the-art performance on the proposed benchmark.

preprint2021arXiv

Online Disease Self-diagnosis with Inductive Heterogeneous Graph Convolutional Networks

We propose a Healthcare Graph Convolutional Network (HealGCN) to offer disease self-diagnosis service for online users based on Electronic Healthcare Records (EHRs). Two main challenges are focused in this paper for online disease diagnosis: (1) serving cold-start users via graph convolutional networks and (2) handling scarce clinical description via a symptom retrieval system. To this end, we first organize the EHR data into a heterogeneous graph that is capable of modeling complex interactions among users, symptoms and diseases, and tailor the graph representation learning towards disease diagnosis with an inductive learning paradigm. Then, we build a disease self-diagnosis system with a corresponding EHR Graph-based Symptom Retrieval System (GraphRet) that can search and provide a list of relevant alternative symptoms by tracing the predefined meta-paths. GraphRet helps enrich the seed symptom set through the EHR graph when confronting users with scarce descriptions, hence yield better diagnosis accuracy. At last, we validate the superiority of our model on a large-scale EHR dataset.

preprint2014arXiv

The Linear Information Coupling Problems

Many network information theory problems face the similar difficulty of single-letterization. We argue that this is due to the lack of a geometric structure on the space of probability distribution. In this paper, we develop such a structure by assuming that the distributions of interest are close to each other. Under this assumption, the K-L divergence is reduced to the squared Euclidean metric in an Euclidean space. In addition, we construct the notion of coordinate and inner product, which will facilitate solving communication problems. We will present the application of this approach to the point-to-point channel, general broadcast channel, and the multiple access channel (MAC) with the common source. It can be shown that with this approach, information theory problems, such as the single-letterization, can be reduced to some linear algebra problems. Moreover, we show that for the general broadcast channel, transmitting the common message to receivers can be formulated as the trade-off between linear systems. We also provide an example to visualize this trade-off in a geometric way. Finally, for the MAC with the common source, we observe a coherent combining gain due to the cooperation between transmitters, and this gain can be quantified by applying our technique.

preprint2013arXiv

On Locally Decodable Source Coding

Locally decodable channel codes form a special class of error-correcting codes with the property that the decoder is able to reconstruct any bit of the input message from querying only a few bits of a noisy codeword. It is well known that such codes require significantly more redundancy (in particular have vanishing rate) compared to their non-local counterparts. In this paper, we define a dual problem, i.e. locally decodable source codes (LDSC). We consider both almost lossless (block error) and lossy (bit error) cases. In almost lossless case, we show that optimal compression (to entropy) is possible with O(log n) queries to compressed string by the decompressor. We also show the following converse bounds: 1) linear LDSC cannot achieve any rate below one, with a bounded number of queries, 2) rate of any source coding with linear decoder (not necessarily local) in one, 3) for 2 queries, any code construction cannot have a rate below one. In lossy case, we show that any rate above rate distortion is achievable with a bounded number of queries. We also show that, rate distortion is achievable with any scaling number of queries. We provide an achievability bound in the finite block-length regime and compare it with the existing bounds in succinct data structures literature.

preprint2012arXiv

Linear Information Coupling Problems

Many network information theory problems face the similar difficulty of single letterization. We argue that this is due to the lack of a geometric structure on the space of probability distribution. In this paper, we develop such a structure by assuming that the distributions of interest are close to each other. Under this assumption, the K-L divergence is reduced to the squared Euclidean metric in an Euclidean space. Moreover, we construct the notion of coordinate and inner product, which will facilitate solving communication problems. We will also present the application of this approach to the point-to-point channel and the general broadcast channel, which demonstrates how our technique simplifies information theory problems.

preprint2011arXiv

Proof of the outage probability conjecture for MISO channels

In Telatar 1999, it is conjectured that the covariance matrices minimizing the outage probability for MIMO channels with Gaussian fading are diagonal with either zeros or constant values on the diagonal. In the MISO setting, this is equivalent to conjecture that the Gaussian quadratic forms having largest tale probability correspond to such diagonal matrices. We prove here the conjecture in the MISO setting.