Source author record

Shaohua Wu

Shaohua Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Computational Engineering, Finance, and Science cond-mat.mtrl-sci eess.SP eess.SY Multiagent Systems Software Engineering Systems and Control

Catalog footprint

What is connected

4works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Chromatographic Process Design and Optimization Platform Powered by Large Language Models: A Case Application on Extract of Ginkgo Biloba Leaf

Chromatographic separation technology has been widely applied in pharmaceutical, chemical, and food industries due to its high efficiency. However, traditional human-dependent chromatographic process development faces challenges such as reliance on expert experience, long development cycles, and labor intensity. ChromR, a large language model (LLM)-driven platform for chromatographic process design and optimization, is presented in this work. The platform integrates ChromLLM, a domain-specific LLM trained for chromatography, along with a multi-agent system and an automated chromatographic experimental device. The multi-agent system comprises four agents: domain knowledge answering, experimental design, experimental execution, and data analysis. ChromR enables automatic completion of the entire workflow-including initial process parameter recommendation, experimental design, automated execution, data analysis, and multi-objective optimization. By utilizing ChromR, dependency on expert knowledge is effectively reduced, while labor input and development time are significantly decreased. Chromatographic purification of the extract of Ginkgo biloba leaf (EGBL) was selected as a case study. ChromR successfully developed a chromatographic process within one week that meets multiple objectives, including fraction quality and production efficiency, reducing development time to approximately one-seventh of that required by the conventional paradigm. An intelligent, automated, and universally applicable new paradigm was established for chromatographic process development.

preprint2026arXiv

A Generalizable Framework for Building Executable Domain-Specific LLMs under Data Scarcity: Demonstration on Semiconductor TCAD Simulation

Scientific and engineering verticals often suffer from data scarcity and strict executability requirements: models must generate not only fluent text, but also syntactically valid, tool-compilable scripts. We present a schema-first alignment framework for building compact, executable domain-specific LLMs in low-resource settings. The framework integrates three core components: (i) large-scale synthetic QA data generation from expert documentation to instill foundational domain knowledge; (ii) a code-centric IR->DPO workflow that converts verified tool decks into interpretable intermediate representations (IR), performs equivalence-preserving diversification, and constructs preference pairs to directly optimize instruction compliance and code executability; and (iii) a controlled evaluation of Retrieval-Augmented Generation (RAG), showing that while RAG benefits general LLMs, it can marginally degrade the performance of already domain-aligned models. We demonstrate the framework by instantiating TcadGPT for semiconductor Technology Computer-Aided Design (TCAD). Using 1.5M synthetic QA pairs and an IR-driven DPO dataset, TcadGPT attains 85.6% semantic accuracy and an 80.0% syntax pass rate on SDE executability tests, substantially outperforming state-of-the-art general LLMs such as GPT-4o. To probe portability beyond TCAD, we apply the same recipe to the open-source FEM solver Elmer, observing consistent improvements in script-level success rates over general-purpose baselines. All datasets, benchmarks, and code (including P1, P2, and IR->DPO) are released for reproducibility. Together, these results suggest that the proposed framework provides a robust and reproducible path toward executable LLMs in specialized, data-scarce professional domains.

preprint2022arXiv

Age of Information with Hybrid-ARQ: A Unified Explicit Result

Delivering timely status updates in a timeliness-critical communication system is of paramount importance to assist accurate and efficient decision making. Therefore, the topic of analyzing Age of Information has aroused new research interest. This paper contributes to new results in this area by systematically analyzing the AoI of two types of Hybrid Automatic Repeat reQuest (HARQ) techniques that have been newly standardized in the Release-16 5G New Radio (NR) specifications, namely reactive HARQ and proactive HARQ. Under a code-based status update system with non-trivial coding delay, transmission delay, propagation delay, decoding delay, and feedback delay, we derive unified closed-form average AoI and average Peak AoI expressions for reactive HARQ and proactive HARQ, respectively. Based on the obtained explicit expressions, we formulate an AoI minimization problem to investigate the age-optimal codeblock assignment strategy in the finite block-length (FBL) regime. Through case studies and analytical results, we provide comparative insights between reactive HARQ and proactive HARQ from a perspective of freshness of information. The numerical results and optimization solutions show that proactive HARQ draws its strength from both age performance and system robustness, thus enabling the potential to provide new system advancement of a freshness-critical status update system.

preprint2022arXiv

New Upper Bounds on the Error Probability under ML Decoding for Spinal Codes and the Joint Transmission-Decoding System Design

Spinal codes are a type of capacity-achieving rateless codes that have been proved to approach the Shannon capacity over the additive white Gaussian noise (AWGN) channel and the binary symmetric channel (BSC). In this paper, we aim to analyze the bounds on the error probability of Spinal codes and design a joint transmission-decoding system. First, in the finite block-length regime, we derive new upper bounds on the Maximum Likelihood (ML) decoding error probability for Spinal codes over both the AWGN channel and the BSC. Then, based on the derived bounds, we formulate a rate maximization problem. As the solution exhibits an incremental-tail-transmission pattern, we propose an improved transmission scheme, referred to as the thresholded incremental tail transmission (TITT) scheme. Moreover, we also develop a dynamic TITT-matching decoding algorithm, called the bubble decoding with memory (BD-M) algorithm, to reduce the decoding time complexity. The TITT scheme at the transmitter and the BD-M algorithm at the receiver jointly constitute a dynamic transmission-decoding system for Spinal code, improving its rate performance and decoding throughput. Theoretical analysis and simulation results are provided to verify the superiority of the derived bounds and the proposed joint transmission-decoding system design.

Shaohua Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Chromatographic Process Design and Optimization Platform Powered by Large Language Models: A Case Application on Extract of Ginkgo Biloba Leaf

A Generalizable Framework for Building Executable Domain-Specific LLMs under Data Scarcity: Demonstration on Semiconductor TCAD Simulation

Age of Information with Hybrid-ARQ: A Unified Explicit Result

New Upper Bounds on the Error Probability under ML Decoding for Spinal Codes and the Joint Transmission-Decoding System Design