Researcher profile

Di Wang

Di Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

A Generalizable Framework for Building Executable Domain-Specific LLMs under Data Scarcity: Demonstration on Semiconductor TCAD Simulation

Scientific and engineering verticals often suffer from data scarcity and strict executability requirements: models must generate not only fluent text, but also syntactically valid, tool-compilable scripts. We present a schema-first alignment framework for building compact, executable domain-specific LLMs in low-resource settings. The framework integrates three core components: (i) large-scale synthetic QA data generation from expert documentation to instill foundational domain knowledge; (ii) a code-centric IR->DPO workflow that converts verified tool decks into interpretable intermediate representations (IR), performs equivalence-preserving diversification, and constructs preference pairs to directly optimize instruction compliance and code executability; and (iii) a controlled evaluation of Retrieval-Augmented Generation (RAG), showing that while RAG benefits general LLMs, it can marginally degrade the performance of already domain-aligned models. We demonstrate the framework by instantiating TcadGPT for semiconductor Technology Computer-Aided Design (TCAD). Using 1.5M synthetic QA pairs and an IR-driven DPO dataset, TcadGPT attains 85.6% semantic accuracy and an 80.0% syntax pass rate on SDE executability tests, substantially outperforming state-of-the-art general LLMs such as GPT-4o. To probe portability beyond TCAD, we apply the same recipe to the open-source FEM solver Elmer, observing consistent improvements in script-level success rates over general-purpose baselines. All datasets, benchmarks, and code (including P1, P2, and IR->DPO) are released for reproducibility. Together, these results suggest that the proposed framework provides a robust and reproducible path toward executable LLMs in specialized, data-scarce professional domains.

preprint2026arXiv

A Hybrid Framework Combining Autoregression and Common Factors for Matrix Time Series

Matrix-valued time series are ubiquitous in modern economics and finance, yet modeling them requires navigating a trade-off between flexibility and parsimony. We propose the Matrix Autoregressive model with Common Factors (MARCF), a unified framework for high-dimensional matrix time series that bridges the structural gap between the Matrix Autoregression (MAR) and Matrix Factor Model (MFM). While MAR typically assumes distinct predictor and response subspaces and MFM enforces identical ones, MARCF explicitly characterizes the intersection of these subspaces. By decomposing the coefficient matrices into common, predictor-specific, and response-specific components, the framework accommodates distinct input and output structures while exploiting their overlap for dimension reduction. We develop a regularized gradient descent estimator that is scalable for high-dimensional data and can efficiently handle the non-convex parameter space. Theoretical analysis establishes local linear convergence of the algorithm and statistical consistency of the estimator under high-dimensional scaling. The estimation efficiency and interpretability of the proposed methods are demonstrated through simulations and an application to global macroeconomic forecasting.

preprint2026arXiv

Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.

preprint2026arXiv

Controllable Concept Bottleneck Models

Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on static scenarios where the data and concepts are assumed to be fixed and clean. In real-world applications, deployed models require continuous maintenance: we often need to remove erroneous or sensitive data (unlearning), correct mislabeled concepts, or incorporate newly acquired samples (incremental learning) to adapt to evolving environments. Thus, deriving efficient editable CBMs without retraining from scratch remains a significant challenge, particularly in large-scale applications. To address these challenges, we propose Controllable Concept Bottleneck Models (CCBMs). Specifically, CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level, the latter of which encompasses both data removal and data addition. CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining. Experimental results demonstrate the efficiency and adaptability of our CCBMs, affirming their practical value in enabling dynamic and trustworthy CBMs.

preprint2026arXiv

Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation

Automated medical report generation, MRG, holds substantial value for alleviating radiologist workload and enhancing diagnostic efficiency. However, mainstream approaches typically treat diverse chest abnormalities as isolated classification targets. This paradigm often overlooks inherent disease co-occurrences and struggles to translate medical topological structures into explicit data correlations, constraining the model's reasoning capacity on complex or subtle lesions. To address this, we propose a Graph-Augmented Dual-Stream Medical Report Generation with Topological Internalization, GDMRG. Our framework introduces a Topological Knowledge Internalization module, TKI, which leverages a Graph Convolutional Network, GCN, to generate an explicit parameterized weight matrix based on global disease co-occurrence priors. This facilitates efficient topological knowledge injection without relying on external retrieval mechanisms. Building upon this, we construct a dual-stream classification system: the main branch generates discrete diagnostic prompts under topological constraints, while the auxiliary branch employs an asymmetric optimization strategy to dynamically calibrate decision boundaries for highly imbalanced samples. Concurrently, to establish a logical closed loop between diagnosis and visual grounding, we design a diagnostic-driven Diagnosis-Guided Spatial Attention, DGSA, that utilizes high-dimensional clinical semantics to recalibrate the visual encoder, mitigating feature hallucinations. Comprehensive experiments on the MIMIC-CXR dataset demonstrate that GDMRG achieves competitive clinical efficacy, CE, while maintaining natural language fluency. Furthermore, our model exhibits robust zero-shot generalization on the IU X-Ray dataset. In summary, this work presents an integrated and interpretable paradigm for medical report generation.

preprint2026arXiv

Investigating CoT Monitorability in Large Reasoning Models

Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks by engaging in extended reasoning before producing final answers. Beyond improving abilities, these detailed reasoning traces also create a new opportunity for AI safety, CoT Monitorability: monitoring potential model misbehavior, such as the use of shortcuts or sycophancy, through their chain-of-thought (CoT) during decision-making. However, two key fundamental challenges arise when attempting to build more effective monitors through CoT analysis. First, as prior research on CoT faithfulness has pointed out, models do not always truthfully represent their internal decision-making in the generated reasoning. Second, monitors themselves may be either overly sensitive or insufficiently sensitive, and can potentially be deceived by models' long, elaborate reasoning traces. In this paper, we present the first systematic investigation of the challenges and potential of CoT monitorability. Motivated by two fundamental challenges we mentioned before, we structure our study around two central perspectives: (i) verbalization: to what extent do LRMs faithfully verbalize the true factors guiding their decisions in the CoT, and (ii) monitor reliability: to what extent can misbehavior be reliably detected by a CoT-based monitor? Specifically, we provide empirical evidence and correlation analyses between verbalization quality, monitor reliability, and LLM performance across mathematical, scientific, and ethical domains. Then we further investigate how different CoT intervention methods, designed to improve reasoning efficiency or performance, will affect monitoring effectiveness. Finally, we propose MoME, a new paradigm in which LLMs monitor other models' misbehavior through their CoT and provide structured judgments along with supporting evidence.

preprint2026arXiv

Minimax Optimal Robust Sparse Regression with Heavy-Tailed Designs: A Gradient-Based Approach

We investigate high-dimensional sparse regression when both the noise and the design matrix exhibit heavy-tailed behavior. Standard algorithms typically fail in this regime, as heavy-tailed covariates distort the empirical risk geometry. We propose a unified framework, Robust Iterative Gradient descent with Hard Thresholding (RIGHT), which employs a robust gradient estimator to bypass the need for higher-order moment conditions. Our analysis reveals a fundamental decoupling phenomenon: in linear regression, the estimation error rate is governed by the noise tail index, while the sample complexity required for stability is governed by the design tail index. This implies that while heavy-tailed noise limits precision, heavy-tailed designs primarily raise the sample size barrier for convergence. In contrast, for logistic regression, we show that the bounded gradient naturally robustifies the estimator against heavy-tailed designs, restoring standard parametric rates. We derive matching minimax lower bounds to prove that RIGHT achieves optimal estimation accuracy and sample complexity across these regimes, without requiring sample splitting or the existence of the population risk.

preprint2026arXiv

Reduced-Rank Autoregressive Model for High-Dimensional Multivariate Network Time Series

Multivariate network time series are ubiquitous in modern systems, yet existing network autoregressive models typically treat nodes as scalar processes, ignoring cross-variable spillovers. To capture these complex interactions without the curse of dimensionality, we propose the Reduced-Rank Network Autoregressive (RRNAR) model. Our framework introduces a separable bilinear transition structure that couples the known network topology with a learnable low-rank variable subspace. We estimate the model using a novel Scaled Gradient Descent (ScaledGD) algorithm, explicitly designed to bridge the gap between rigid network scalars and flexible factor components. Theoretically, we establish non-asymptotic error bounds under a novel distance metric. A key finding is a network-induced blessing of dimensionality: for sparse networks, the estimation accuracy for network parameters improves as the network size grows. Applications to traffic and server monitoring networks demonstrate that RRNAR significantly outperforms univariate and unstructured benchmarks by identifying latent cross-channel propagation mechanisms.

preprint2026arXiv

Second-Order Convergence in Private Stochastic Non-Convex Optimization

We investigate the problem of finding second-order stationary points (SOSP) in differentially private (DP) stochastic non-convex optimization. Existing methods suffer from two key limitations: (i) inaccurate convergence error rate due to overlooking gradient variance in the saddle point escape analysis, and (ii) dependence on auxiliary private model selection procedures for identifying DP-SOSP, which can significantly impair utility, particularly in distributed settings. To address these issues, we propose a generic perturbed stochastic gradient descent (PSGD) framework built upon Gaussian noise injection and general gradient oracles. A core innovation of our framework is using model drift distance to determine whether PSGD escapes saddle points, ensuring convergence to approximate local minima without relying on second-order information or additional DP-SOSP identification. By leveraging the adaptive DP-SPIDER estimator as a specific gradient oracle, we develop a new DP algorithm that rectifies the convergence error rates reported in prior work. We further extend this algorithm to distributed learning with heterogeneous data, providing the first formal guarantees for finding DP-SOSP in such settings. Our analysis also highlights the detrimental impacts of private selection procedures in distributed learning under high-dimensional models, underscoring the practical benefits of our design. Numerical experiments on real-world datasets validate the efficacy of our approach.

preprint2026arXiv

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity, offer promising efficiency gains but suffer from unstable contextual learning and multitask generalization. Some works conduct layer-level hybrid structures that combine Transformer and Mamba layers, aiming to make full use of both advantages. This paper proposes TransMamba, a novel sequence-level hybrid framework that unifies Transformer and Mamba through shared parameter matrices (QKV and CBx), and thus could dynamically switch between attention and SSM mechanisms at different token lengths and layers. We design the Memory Converter to bridge Transformer and Mamba by converting attention outputs into SSM-compatible states, ensuring seamless information flow at TransPoints where the transformation happens. The TransPoint scheduling is also thoroughly explored for balancing effectiveness and efficiency. We conducted extensive experiments demonstrating that TransMamba achieves superior training efficiency and performance compared to single and hybrid baselines, and validated the deeper consistency between Transformer and Mamba paradigms at sequence level, offering a scalable solution for next-generation language modeling. Code and data are available at https://github.com/Yixing-Li/TransMamba

preprint2025arXiv

Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning

Change detection visual question answering (CDVQA) requires answering text queries by reasoning about semantic changes in bi-temporal remote sensing images. A straightforward approach is to boost CDVQA performance with generic vision-language models via supervised fine-tuning (SFT). Despite recent progress, we observe that a significant portion of failures do not stem from clearly incorrect predictions, but from decision ambiguity, where the model assigns similar confidence to the correct answer and strong distractors. To formalize this challenge, we define Decision-Ambiguous Samples (DAS) as instances with a small probability margin between the ground-truth answer and the most competitive alternative. We argue that explicitly optimizing DAS is crucial for improving the discriminability and robustness of CDVQA models. To this end, we propose DARFT, a Decision-Ambiguity-guided Reinforcement Fine-Tuning framework that first mines DAS using an SFT-trained reference policy and then applies group-relative policy optimization on the mined subset. By leveraging multi-sample decoding and intra-group relative advantages, DARFT suppresses strong distractors and sharpens decision boundaries without additional supervision. Extensive experiments demonstrate consistent gains over SFT baselines, particularly under few-shot settings.