Researcher profile

Bo Bai

Bo Bai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

DB3 Team's Solution For Meta KDD Cup' 25

This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-specific retrieval pipelines handling image-indexed knowledge graphs, web sources, and multi-turn conversations; and (2) advanced refusal training using SFT, DPO, and RL. The system achieved 2nd place in Task 1, 2nd place in Task 2, and 1st place in Task 3, securing the grand prize for excellence in ego-centric queries through superior handling of first-person perspective challenges.

preprint2026arXiv

Memo-SQL: Structured Decomposition and Experience-Driven Self-Correction for Training-Free NL2SQL

Existing NL2SQL systems face two critical limitations: (1) they rely on in-context learning with only correct examples, overlooking the rich signal in historical error-fix pairs that could guide more robust self-correction; and (2) test-time scaling approaches often decompose questions arbitrarily, producing near-identical SQL candidates across runs and diminishing ensemble gains. Moreover, these methods suffer from a stark accuracy-efficiency trade-off: high performance demands excessive computation, while fast variants compromise quality. We present Memo-SQL, a training-free framework that addresses these issues through two simple ideas: structured decomposition and experience-aware self-correction. Instead of leaving decomposition to chance, we apply three clear strategies, entity-wise, hierarchical, and atomic sequential, to encourage diverse reasoning. For correction, we build a dynamic memory of both successful queries and historical error-fix pairs, and use retrieval-augmented prompting to bring relevant examples into context at inference time, no fine-tuning or external APIs required. On BIRD, Memo-SQL achieves 68.5% execution accuracy, setting a new state of the art among open, zero-fine-tuning methods, while using over 10 times fewer resources than prior TTS approaches.

preprint2022arXiv

An Efficient Two-Stage SPARC Decoder for Massive MIMO Unsourced Random Access

In this paper, we study a concatenate coding scheme based on sparse regression code (SPARC) and tree code for unsourced random access in massive multiple-input and multiple-output systems. Our focus is concentrated on efficient decoding for the inner SPARC with practical concerns. A two-stage method is proposed to achieve near-optimal performance while maintaining low computational complexity. Specifically, a one-step thresholding-based algorithm is first used for reducing large dimensions of the SPARC decoding, after which a relaxed maximum-likelihood estimator is employed for refinement. Adequate simulation results are provided to validate the near-optimal performance and the low computational complexity. Besides, for covariance-based sparse recovery method, theoretical analyses are given to characterize the upper bound of the number of active users supported when convex relaxation is considered, and the probability of successful dimension reduction by the one-step thresholding-based algorithm.

preprint2022arXiv

An Optimal Transport Approach to the Computation of the LM Rate

Mismatch capacity characterizes the highest information rate for a channel under a prescribed decoding metric, and is thus a highly relevant fundamental performance metric when dealing with many practically important communication scenarios. Compared with the frequently used generalized mutual information (GMI), the LM rate has been known as a tighter lower bound of the mismatch capacity. The computation of the LM rate, however, has been a difficult task, due to the fact that the LM rate involves a maximization over a function of the channel input, which becomes challenging as the input alphabet size grows, and direct numerical methods (e.g., interior point methods) suffer from intensive memory and computational resource requirements. Noting that the computation of the LM rate can also be formulated as an entropy-based optimization problem with constraints, in this work, we transform the task into an optimal transport (OT) problem with an extra constraint. This allows us to efficiently and accurately accomplish our task by using the well-known Sinkhorn algorithm. Indeed, only a few iterations are required for convergence, due to the fact that the formulated problem does not contain additional regularization terms. Moreover, we convert the extra constraint into a root-finding procedure for a one-dimensional monotonic function. Numerical experiments demonstrate the feasibility and efficiency of our OT approach to the computation of the LM rate.

preprint2022arXiv

Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation

With the development of deep learning techniques, the combination of deep learning with image compression has drawn lots of attention. Recently, learned image compression methods had exceeded their classical counterparts in terms of rate-distortion performance. However, continuous rate adaptation remains an open question. Some learned image compression methods use multiple networks for multiple rates, while others use one single model at the expense of computational complexity increase and performance degradation. In this paper, we propose a continuously rate adjustable learned image compression framework, Asymmetric Gained Variational Autoencoder (AG-VAE). AG-VAE utilizes a pair of gain units to achieve discrete rate adaptation in one single model with a negligible additional computation. Then, by using exponential interpolation, continuous rate adaptation is achieved without compromising performance. Besides, we propose the asymmetric Gaussian entropy model for more accurate entropy estimation. Exhaustive experiments show that our method achieves comparable quantitative performance with SOTA learned image compression methods and better qualitative performance than classical image codecs. In the ablation study, we confirm the usefulness and superiority of gain units and the asymmetric Gaussian entropy model.

preprint2022arXiv

CGN: A Capacity-Guaranteed Network Architecture for Future Ultra-Dense Wireless Systems

The sixth generation (6G) era is envisioned to be a fully intelligent and autonomous era, with physical and digital lifestyles merged together. Future wireless network architectures should provide a solid support for such new lifestyles. A key problem thus arises that what kind of network architectures are suitable for 6G. In this paper, we propose a capacity-guaranteed network (CGN) architecture, which provides high capacity for wireless devices densely distributed everywhere, and ensures a superior scalability with low signaling overhead and computation complexity simultaneously. Our theorem proves that the essence of a CGN architecture is to decompose the whole network into non-overlapping clusters with equal cluster sum capacity. Simulation results reveal that in terms of the minimum cluster sum capacity, the proposed CGN can achieve at least 30% performance gain compared with existing base station clustering (BS-clustering) architectures. In addition, our theorem is sufficiently general and can be applied for networks with different distributions of BSs and users.

preprint2022arXiv

Fast Sinkhorn II: Collinear Triangular Matrix and Linear Time Accurate Computation of Optimal Transport

In our previous work [arXiv:2202.10042], the complexity of Sinkhorn iteration is reduced from $O(N^2)$ to the optimal $O(N)$ by leveraging the special structure of the kernel matrix. In this paper, we explore the special structure of kernel matrices by defining and utilizing the properties of the Lower-ColLinear Triangular Matrix (L-CoLT matrix) and Upper-ColLinear Triangular Matrix (U-CoLT matrix). We prove that (1) L/U-CoLT matrix-vector multiplications can be carried out in $O(N)$ operations; (2) both families of matrices are closed under the Hadamard product and matrix scaling. These properties help to alleviate two key difficulties for reducing the complexity of the Inexact Proximal point method (IPOT), and allow us to significantly reduce the number of iterations to $O(N)$. This yields the Fast Sinkhorn II (FS-2) algorithm for accurate computation of optimal transport with low algorithm complexity and fast convergence. Numerical experiments are presented to show the effectiveness and efficiency of our approach.

preprint2022arXiv

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions, output formats, data distributions, etc. Therefore, we innovatively develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training. SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments. Excitingly, the exhaustive experiments show that, our GPPF-R50 model achieves significant improvements of 2.5-5.8 over a strong baseline of the 8 pre-training tasks in GPPF-15M and harvests a range of SOTAs over the 22 downstream tasks with similar computation budgets. We also validate the generalization ability of GPPF to SOTA vision transformers with consistent improvements. These solid experimental results fully prove the effective knowledge learning, storing, sharing, and transfer provided by our novel GPPF framework.

preprint2022arXiv

Semantic Compression with Side Information: A Rate-Distortion Perspective

We consider the semantic rate-distortion problem motivated by task-oriented video compression. The semantic information corresponding to the task, which is not observable to the encoder, shows impacts on the observations through a joint probability distribution. The similarities among intra-frame segments and inter-frames in video compression are formulated as side information available at both the encoder and the decoder. The decoder is interested in recovering the observation and making an inference of the semantic information under certain distortion constraints. We establish the information-theoretic limits for the tradeoff between compression rates and distortions by fully characterizing the rate-distortion function. We further evaluate the rate-distortion function under specific Markov conditions for three scenarios: i) both the task and the observation are binary sources; ii) the task is a binary classification of an integer observation as even and odd; iii) Gaussian correlated task and observation. We also illustrate through numerical results that recovering only the semantic information can reduce the coding rate comparing to recovering the source observation.

preprint2022arXiv

Tail Quantile Estimation for Non-preemptive Priority Queues

Motivated by applications in computing and telecommunication systems, we investigate the problem of estimating p-quantile of steady-state sojourn times in a single-server multi-class queueing system with non-preemptive priorities for p close to 1. The main challenge in this problem lies in efficient sampling from the tail event. To address this issue, we develop a regenerative simulation algorithm with importance sampling. In addition, we establish a central limit theorem for the estimator to construct the confidence interval. Numerical experiments show that our algorithm outperforms benchmark simulation methods. Our result contributes to the literature on rare event simulation for queueing systems.

preprint2022arXiv

The Moment Passing Method for Wireless Channel Capacity Estimation

Wireless network capacity can be regarded as the most important performance metric for wireless communication systems. With the fast development of wireless communication technology, future wireless systems will become more and more complicated. As a result, the channel gain matrix will become a large-dimensional random matrix, leading to an extremely high computational cost to obtain the capacity. In this paper, we propose a moment passing method (MPM) to realize the fast and accurate capacity estimation for future ultra-dense wireless systems. It can determine the capacity with quadratic complexity, which is optimal considering that the cost of a single matrix operation is not less than quadratic complexity. Moreover, it has high accuracy. The simulation results show that the estimation error of this method is below 2 percent. Finally, our method is highly general, as it is independent of the distributions of BSs and users, and the shape of network areas. More importantly, it can be applied not only to the conventional multi-user multiple input and multiple output (MU-MIMO) networks, but also to the capacity-centric networks designed for B5G/6G.

preprint2022arXiv

Two New Piggybacking Designs with Lower Repair Bandwidth

Piggybacking codes are a special class of MDS array codes that can achieve small repair bandwidth with small sub-packetization by first creating some instances of an $(n,k)$ MDS code, such as a Reed-Solomon (RS) code, and then designing the piggyback function. In this paper, we propose a new piggybacking coding design which designs the piggyback function over some instances of both $(n,k)$ MDS code and $(n,k&#39;)$ MDS code, when $k\geq k&#39;$. We show that our new piggybacking design can significantly reduce the repair bandwidth for single-node failures. When $k=k&#39;$, we design piggybacking code that is MDS code and we show that the designed code has lower repair bandwidth for single-node failures than all existing piggybacking codes when the number of parity node $r=n-k\geq8$ and the sub-packetization $α<r$. Moreover, we propose another piggybacking codes by designing $n$ piggyback functions of some instances of $(n,k)$ MDS code and adding the $n$ piggyback functions into the $n$ newly created empty entries with no data symbols. We show that our code can significantly reduce repair bandwidth for single-node failures at a cost of slightly more storage overhead. In addition, we show that our code can recover any $r+1$ node failures for some parameters. We also show that our code has lower repair bandwidth than locally repairable codes (LRCs) under the same fault-tolerance and redundancy for some parameters.

preprint2021arXiv

A Convergent Semi-Proximal Alternating Direction Method of Multipliers for Recovering Internet Traffics from Link Measurements

It is challenging to recover the large-scale internet traffic data purely from the link measurements. With the rapid growth of the problem scale, it will be extremely difficult to sustain the recovery accuracy and the computational cost. In this work, we propose a new Sparsity Low-Rank Recovery (SLRR) model and its Schur Complement Based semi-proximal Alternating Direction Method of Multipliers (SCB-spADMM) solver. Our approach distinguishes itself mainly for the following two aspects. First, we fully exploit the spatial low-rank property and the sparsity of traffic data, which are barely considered in the literature. Our model can be divided into a series of subproblems, which only relate to the traffics in a certain individual time interval. Thus, the model scale is significantly reduced. Second, we establish a globally convergent ADMM-type algorithm inspired by [Li et al., Math. Program., 155(2016)] to solve the SLRR model. In each iteration, all the intermediate variables&#39; optimums can be calculated analytically, which makes the algorithm fast and accurate. Besides, due to the separability of the SLRR model, it is possible to design a parallel algorithm to further reduce computational time. According to the numerical results on the classic datasets Abilene and GEANT, our method achieves the best accuracy with a low computational cost. Moreover, in our newly released large-scale Huawei Origin-Destination (HOD) network traffics, our method perfectly reaches the seconds-level feedback, which meets the essential requirement for practical scenarios.

preprint2021arXiv

Lower Bound on the Optimal Access Bandwidth of ($K+2,K,2$)-MDS Array Code with Degraded Read Friendly

Accessing the data in the failed disk (degraded read) with low latency is crucial for an erasure-coded storage system. In this work, the maximum distance separable (MDS) array code with the property of degraded-read friendly (DRF) is discussed. For the DRF MDS array code with 2 redundant nodes and the sub-packetization level of 2, the lower bound of its access bandwidth is derived.

preprint2021arXiv

Structural Entropy of the Stochastic Block Models

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the &#34;structural information&#34; only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős--Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

preprint2020arXiv

Deep Reinforcement Learning for Fresh Data Collection in UAV-assisted IoT Networks

Due to the flexibility and low operational cost, dispatching unmanned aerial vehicles (UAVs) to collect information from distributed sensors is expected to be a promising solution in Internet of Things (IoT), especially for time-critical applications. How to maintain the information freshness is a challenging issue. In this paper, we investigate the fresh data collection problem in UAV-assisted IoT networks. Particularly, the UAV flies towards the sensors to collect status update packets within a given duration while maintaining a non-negative residual energy. We formulate a Markov Decision Process (MDP) to find the optimal flight trajectory of the UAV and transmission scheduling of the sensors that minimizes the weighted sum of the age of information (AoI). A UAV-assisted data collection algorithm based on deep reinforcement learning (DRL) is further proposed to overcome the curse of dimensionality. Extensive simulation results demonstrate that the proposed DRL-based algorithm can significantly reduce the weighted sum of the AoI compared to other baseline algorithms.