Source author record

Fan Bai

Fan Bai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Biological Physics Computation and Language Distributed, Parallel, and Cluster Computing Machine Learning math.AP Robotics Subcellular Processes

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EPD-Serve: A Flexible Multimodal EPD Disaggregation Inference Serving System On Ascend

With the widespread adoption of large multimodal models, efficient inference across text, image, audio, and video modalities has become critical. However, existing multimodal inference systems typically employ monolithic architectures that tightly couple the Encode, Prefill, and Decode stages on homogeneous hardware, neglecting the heterogeneous computational characteristics of each stage. This design leads to inefficient resource utilization and limited system throughput. To address these issues, we propose EPD-Serve, a stage-level disaggregated inference serving system for multimodal models. EPD-Serve decouples the inference pipeline into independent Encode, Prefill, and Decode stages, enabling logical isolation and flexible co-located deployment through dynamic orchestration. Leveraging the Ascend interconnect topology, EPD-Serve introduces asynchronous feature prefetching between Encode and Prefill stages and a hierarchical grouped KV cache transmission mechanism between Prefill and Decode stages to improve cross-node communication efficiency. In addition, EPD-Serve incorporates multi-route scheduling, instance-level load balancing, and multi-stage hardware co-location with spatial multiplexing to better support diverse multimodal workloads. Comprehensive experiments on multimodal understanding models demonstrate that, under high-concurrency scenarios, EPD-Serve improves end-to-end throughput by 57.37-69.48% compared to PD-disaggregated deployment, while satisfying strict SLO constraints, including TTFT below 2000 ms and TPOT below 50 ms. These results highlight the effectiveness of stage-level disaggregation for optimizing multimodal large model inference systems.

preprint2022arXiv

C3-STISR: Scene Text Image Super-resolution with Triple Clues

Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clues to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained character-level language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https://github.com/zhaominyiz/C3-STISR.

preprint2022arXiv

Pre-train or Annotate? Domain Adaptation with a Constrained Budget

Recent work has demonstrated that pre-training in-domain language models can boost performance when adapting to a new domain. However, the costs associated with pre-training raise an important question: given a fixed budget, what steps should an NLP practitioner take to maximize performance? In this paper, we view domain adaptation with a constrained budget as a consumer choice problem, where the goal is to select an optimal combination of data annotation and pre-training. We measure annotation costs of three procedural text datasets, along with the pre-training costs of several in-domain language models. The utility of different combinations of pre-training and data annotation are evaluated under varying budget constraints to assess which combination strategy works best. We find that for small budgets, spending all funds on annotation leads to the best performance; once the budget becomes large enough, however, a combination of data annotation and in-domain pre-training yields better performance. Our experiments suggest task-specific data annotation should be part of an economical strategy when adapting an NLP model to a new domain.

preprint2022arXiv

RASEC: Rescaling Acquisition Strategy with Energy Constraints under SE-OU Fusion Kernel for Active Trachea Palpation and Incision Recommendation in Laryngeal Region

A novel palpation-based incision detection strategy in the laryngeal region, potentially for robotic tracheotomy, is proposed in this letter. A tactile sensor is introduced to measure tissue hardness in the specific laryngeal region by gentle contact. The kernel fusion method is proposed to combine the Squared Exponential (SE) kernel with Ornstein-Uhlenbeck (OU) kernel to figure out the drawbacks that the existing kernel functions are not sufficiently optimal in this scenario. Moreover, we further regularize exploration factor and greed factor, and the tactile sensor's moving distance and the robotic base link's rotation angle during the incision localization process are considered as new factors in the acquisition strategy. We conducted simulation and physical experiments to compare the newly proposed algorithm - Rescaling Acquisition Strategy with Energy Constraints (RASEC) in trachea detection with current palpation-based acquisition strategies. The result indicates that the proposed acquisition strategy with fusion kernel can successfully localize the incision with the highest algorithm performance (Average Precision 0.932, Average Recall 0.973, Average F1 score 0.952). During the robotic palpation process, the cumulative moving distance is reduced by 50%, and the cumulative rotation angle is reduced by 71.4% with no sacrifice in the comprehensive performance capabilities. Therefore, it proves that RASEC can efficiently suggest the incision zone in the laryngeal region and greatly reduced the energy loss.

preprint2020arXiv

Text Recognition in Real Scenarios with a Few Labeled Samples

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

preprint2016arXiv

Online Learning for Wireless Distributed Computing

There has been a growing interest for Wireless Distributed Computing (WDC), which leverages collaborative computing over multiple wireless devices. WDC enables complex applications that a single device cannot support individually. However, the problem of assigning tasks over multiple devices becomes challenging in the dynamic environments encountered in real-world settings, considering that the resource availability and channel conditions change over time in unpredictable ways due to mobility and other factors. In this paper, we formulate a task assignment problem as an online learning problem using an adversarial multi-armed bandit framework. We propose MABSTA, a novel online learning algorithm that learns the performance of unknown devices and channel qualities continually through exploratory probing and makes task assignment decisions by exploiting the gained knowledge. For maximal adaptability, MABSTA is designed to make no stochastic assumption about the environment. We analyze it mathematically and provide a worst-case performance guarantee for any dynamic environment. We also compare it with the optimal offline policy as well as other baselines via emulations on trace-data obtained from a wireless IoT testbed, and show that it offers competitive and robust performance in all cases. To the best of our knowledge, MABSTA is the first online algorithm in this domain of task assignment problems and provides provable performance guarantee.

preprint2015arXiv

Solving some Navier-Stokes Equations with the initial conditions being some complex-valued periodic functions on $R^3$

In this paper, we utilize some series and an iterative method to solve some Navier-Stokes equations with the initial conditions being some complex-valued periodic functions on $R^3$. Then a new strategy for dealing with the conjecture of the Navier-Stokes equation is given.

preprint2013arXiv

Coupling between switching regulation and torque generation in bacterial flagellar motor

The bacterial flagellar motor plays a crucial role in both bacterial locomotion and chemotaxis. Recent experiments reveal that the switching dynamics of the motor depends on the motor rotation speed, and thus the motor torque, non-monotonically. Here we present a unified mathematical model which models motor torque generation based on experimental torque-speed curves and torque-dependent switching based on the conformational spread model. The model successfully reproduces the observed switching rate as a function of the rotation speed, and provides a generic physical explanation independent of most details. A stator affects the switching dynamics through two mechanisms: accelerating the conformation flipping rates of individual rotor switching units, which favours slower motor speed and thus increasing torque; and affecting more switching units within unit time, which favours faster speed. Consequently, the switching rate shows a maximum at intermediate speed. Our model predicts that a motor switches more often with more stators. The load-switching relation may serve as a mechanism for sensing the physical environment, similar to the chemotaxis system for sensing the chemical environment. It may also coordinate the switch dynamics of motors within a cell.

Fan Bai

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

EPD-Serve: A Flexible Multimodal EPD Disaggregation Inference Serving System On Ascend

C3-STISR: Scene Text Image Super-resolution with Triple Clues

Pre-train or Annotate? Domain Adaptation with a Constrained Budget

RASEC: Rescaling Acquisition Strategy with Energy Constraints under SE-OU Fusion Kernel for Active Trachea Palpation and Incision Recommendation in Laryngeal Region

Text Recognition in Real Scenarios with a Few Labeled Samples

Online Learning for Wireless Distributed Computing

Solving some Navier-Stokes Equations with the initial conditions being some complex-valued periodic functions on $R^3$

Coupling between switching regulation and torque generation in bacterial flagellar motor