Researcher profile

Zihao Wu

Zihao Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models

In recent years, the rapid advancement of large language models (LLMs) in natural language processing has sparked significant interest among researchers to understand their mechanisms and functional characteristics. Although prior studies have attempted to explain LLM functionalities by identifying and interpreting specific neurons, these efforts mostly focus on individual neuron contributions, neglecting the fact that human brain functions are realized through intricate interaction networks. Inspired by research on functional brain networks (FBNs) in the field of neuroscience, we utilize similar methodologies estabilished in FBN analysis to explore the "functional networks" within LLMs in this study. Experimental results highlight that, much like the human brain, LLMs exhibit certain functional networks that recur frequently during their operation. Further investigation reveals that these functional networks are indispensable for LLM performance. Inhibiting key functional networks severely impairs the model's capabilities. Conversely, amplifying the activity of neurons within these networks can enhance either the model's overall performance or its performance on specific tasks. This suggests that these functional networks are strongly associated with either specific tasks or the overall performance of the LLM. Code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.

preprint2026arXiv

On the origins of oxygen: ALMA and JWST characterise the multi-phase, metal-enriched, star-bursting medium within a 'normal' $z > 11$ galaxy

The unexpectedly high abundance of galaxies at $z > 11$ revealed by JWST has sparked a debate on the nature of early galaxies and the physical mechanisms regulating their formation. The Atacama Large Millimeter/submillimeter Array (ALMA) has begun to provide vital insights on their gas and dust content, but so far only for extreme &#39;blue monsters&#39;. Here we present new, deep ALMA observations of JADES-GS-z11-0, a more typical (sub-$L^*$) $z > 11$ galaxy that bridges the discovery space of JWST and the Hubble Space Telescope. These data confirm the presence of the [O III] 88 $μ$m line at $4.5σ$ significance, precisely at the redshift of several faint emission lines previously seen with JWST/NIRSpec, while the underlying dust continuum remains undetected ($F_ν< 9.0 \, \mathrm{μJy}$), implying an obscured star formation rate (SFR) of $\text{SFR}_\text{IR} \lesssim 6 \, \mathrm{M_\odot \, yr^{-1}}$ and dust mass of $M_\text{dust} \lesssim 1.0 \times 10^{6} \, \mathrm{M_\odot}$ (all $3σ$). The accurate ALMA redshift of $z_\text{[O III]} = 11.1221 \pm 0.0006$ ($\gtrsim \! 5\times$ refined over NIRSpec) helps confirm that redshifts measured purely from the Lyman-$α$ break, even spectroscopically, should properly take into account the effects of potential damped Lyman-$α$ absorption (DLA) systems to avoid systematic overestimates of up to $Δz \approx 0.5$. The [O III] 88 $μ$m luminosity of $L_\text{[O III]} = (1.1 \pm 0.3) \times 10^{8} \, \mathrm{L_\odot}$, meanwhile, agrees well with the scaling relation for local metal-poor dwarfs given the SFR measured by NIRCam, NIRSpec, and MIRI. The spatially resolved MIRI and ALMA emission also underscores that JADES-GS-z11-0 is likely to consist of two low-mass components that are undergoing strong bursts of star formation yet are already pre-enriched in oxygen (~20-30% solar), only 400 Myr after the Big Bang.

preprint2026arXiv

Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability

Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability that standard uncertainty methods do not explicitly quantify because they condition on a fixed preprocessing pipeline. We provide three tools to make this instability measurable, decomposable, and reducible. First, a Walsh-Hadamard decomposition of the 2^7 pipeline space reveals that sensitivity is near-additive in practice under the binary intervention design, enabling efficient step-by-step optimization. Second, we introduce Preprocessing Uncertainty (PU), a per-trial diagnostic that captures a dimension of instability complementary to model-based confidence. Third, we study Normalized Adaptive PGI (NA-PGI), a graph-structured regularizer that exploits the compositional structure of preprocessing interventions as one mitigation strategy with clear scope conditions.

preprint2026arXiv

YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts

Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1) a physics-aware Invariant Feature Distiller that extracts domain-agnostic representations using multi-scale dilated convolutions and FFT-based time-frequency fusion,(2) Domain-Conditioned Sparse Experts (DC-MoE) that adaptively route inputs to specialized processors via learned gating without external meta-data, and (3) a dual-head classification system with auxiliary supervision.Extensive validation on five public bearing datasets (CWRU, MFPT, XJTU,OTTAWA, HUST) through 30 cross-dataset protocols demonstrates the superiority of YOTOnet compared with other state-of-the-art methods. Critically, we observe a clear scaling effect-average test F1 improves from 0.5339(1 training dataset) to 0.705 (4 datasets), with a clear gain when moving from 3 to 4 datasets. These findings provide empirical evidence that foundation model principles can enable robust, train-once deployment for industrial fault diagnosis.

preprint2024arXiv

Understanding LLMs: A Comprehensive Overview from Training to Inference

The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There&#39;s an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of large language model training techniques and inference deployment technologies aligned with this emerging trend. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs&#39; utilization and provides insights into their future development.

preprint2023arXiv

Differentiate ChatGPT-generated and Human-written Medical Texts

Background: Large language models such as ChatGPT are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the Internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public. Objective: This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT, and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT. Results: Medical texts written by humans are more concrete, more diverse, and typically contain more useful information, while medical texts generated by ChatGPT pay more attention to fluency and logic, and usually express general terminologies rather than effective information specific to the context of the problem. A BERT-based model can effectively detect medical texts generated by ChatGPT, and the F1 exceeds 95%.

preprint2022arXiv

An Elusive Population of Massive Disk Galaxies Hosting Double-lobed Radio-loud AGNs

It is commonly accepted that radio-loud active galactic nuclei are hosted exclusively by giant elliptical galaxies. We analyze high-resolution optical Hubble Space Telescope images of a sample of radio galaxies with extended double-lobed structures associated with disk-like optical counterparts. After systematically evaluating the probability of chance alignment between the radio lobes and the optical counterparts, we obtain a sample of 18 objects likely to have genuine associations. The host galaxies have unambiguous late-type morphologies, including spiral arms, large-scale dust lanes among the edge-on systems, and exceptionally weak bulges, as judged by the low global concentrations, small global Sérsic indices, and low bulge-to-total light ratios (median $B/T = 0.13$). With a median Sérsic index of 1.4 and low effective surface brightnesses, the bulges are consistent with being pseudo bulges. The majority of the hosts have unusually large stellar masses (median $M_* = 1.3\times 10^{11}\, M_\odot$) and red optical colors (median $g-r = 0.69\,$mag), consistent with massive, quiescent galaxies on the red sequence. We suggest that black hole mass (stellar mass) plays a fundamental role in launching large-scale radio jets, and that the rarity of extended radio lobes in late-type galaxies is the consequence of the steep stellar mass function at the high-mass end. The disk radio galaxies have mostly Fanaroff-Riley type II morphologies yet lower radio power than sources of a similar type traditionally hosted by ellipticals. The radio jets show no preferential alignment with the minor axis of the galactic bulge or disk, apart from a possible mild tendency for alignment among the most disk-dominated systems.

preprint2022arXiv

Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized Activations

Artificial neural networks (ANNs), originally inspired by biological neural networks (BNNs), have achieved remarkable successes in many tasks such as visual representation learning. However, whether there exists semantic correlations/connections between the visual representations in ANNs and those in BNNs remains largely unexplored due to both the lack of an effective tool to link and couple two different domains, and the lack of a general and effective framework of representing the visual semantics in BNNs such as human functional brain networks (FBNs). To answer this question, we propose a novel computational framework, Synchronized Activations (Sync-ACT), to couple the visual representation spaces and semantics between ANNs and BNNs in human brain based on naturalistic functional magnetic resonance imaging (nfMRI) data. With this approach, we are able to semantically annotate the neurons in ANNs with biologically meaningful description derived from human brain imaging for the first time. We evaluated the Sync-ACT framework on two publicly available movie-watching nfMRI datasets. The experiments demonstrate a) the significant correlation and similarity of the semantics between the visual representations in FBNs and those in a variety of convolutional neural networks (CNNs) models; b) the close relationship between CNN&#39;s visual representation similarity to BNNs and its performance in image classification tasks. Overall, our study introduces a general and effective paradigm to couple the ANNs and BNNs and provides novel insights for future studies such as brain-inspired artificial intelligence.

preprint2022arXiv

Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning the meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical imaging, where the clinical data (e.g., MR images with pathology) are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To address this problem, we propose to infuse human experts&#39; intelligence and domain knowledge into the training of deep neural networks. The core idea is that we infuse the visual attention information from expert radiologists to proactively guide the deep model to focus on regions with potential pathology and avoid being trapped in learning harmful shortcuts. To do so, we propose a novel eye-gaze-guided vision transformer (EG-ViT) for diagnosis with limited medical image data. We mask the input image patches that are out of the radiologists&#39; interest and add an additional residual connection in the last encoder layer of EG-ViT to maintain the correlations of all patches. The experiments on two public datasets of INbreast and SIIM-ACR demonstrate our EG-ViT model can effectively learn/transfer experts&#39; domain knowledge and achieve much better performance than baselines. Meanwhile, it successfully rectifies the harmful shortcut learning and significantly improves the EG-ViT model&#39;s interpretability. In general, EG-ViT takes the advantages of both human expert&#39;s prior knowledge and the power of deep neural networks. This work opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.

preprint2022arXiv

Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning

Learning with little data is challenging but often inevitable in various application scenarios where the labeled data is limited and costly. Recently, few-shot learning (FSL) gained increasing attention because of its generalizability of prior knowledge to new tasks that contain only a few samples. However, for data-intensive models such as vision transformer (ViT), current fine-tuning based FSL approaches are inefficient in knowledge generalization and thus degenerate the downstream task performances. In this paper, we propose a novel mask-guided vision transformer (MG-ViT) to achieve an effective and efficient FSL on ViT model. The key idea is to apply a mask on image patches to screen out the task-irrelevant ones and to guide the ViT to focus on task-relevant and discriminative patches during FSL. Particularly, MG-ViT only introduces an additional mask operation and a residual connection, enabling the inheritance of parameters from pre-trained ViT without any other cost. To optimally select representative few-shot samples, we also include an active learning based sample selection method to further improve the generalizability of MG-ViT based FSL. We evaluate the proposed MG-ViT on both Agri-ImageNet classification task and ACFR apple detection task with gradient-weighted class activation mapping (Grad-CAM) as the mask. The experimental results show that the MG-ViT model significantly improves the performance when compared with general fine-tuning based ViT models, providing novel insights and a concrete approach towards generalizing data-intensive and large-scale deep learning models for FSL.

preprint2022arXiv

On Schwinger pair production between D3 branes

We study the open string pair production between two D3 branes, which will give rise to similar effect as Schwinger pair production for observers on one of the D3 branes. The D3 branes are placed parallel at a distance, and they are carrying world-volume electromagnetic fluxes that takes general form. We derive the pair production rate by computing the interaction amplitude between the D3 branes. We discussed how to maximize the pair production rate in this general case. We also mentioned that the general result can be used to describe other system such as D3-D1, where the pair production is ultra large compared to original Schwinger pair production, making it hopeful to observe pair production in experiments.

preprint2020arXiv

On D-brane interaction & its related properties

We compute the closed-string cylinder amplitude between one Dp brane and the other Dp$\prime$ brane, placed parallel at a separation, with each carrying a general constant worldvolume flux and with $p - p&#39; = 0, 2, 4, 6$ and $p \le 6$. For the $p = p&#39;$, we show that the main part of the amplitude for $p = p&#39; < 5$ is a special case of that for $p = p&#39; = 5$ or $6$ case. For all other $p - p&#39; = 2, 4, 6$ cases, we show that the amplitude is just a special case of the corresponding one for $p = p&#39;$ case. Combining both, we obtain the general formula for the amplitude, which is valid for each of the cases considered and for arbitrary constant worldvolume fluxes. The corresponding general open string one-loop annulus amplitude is also obtained by a Jacobi transformation of the general cylinder one. We give also the general open string pair production rate. We study the properties of the amplitude such as the nature of the interaction, the open string tachyonic instability, and the possible open string pair production and its potential enhancement. In particular, in the presence of pure magnetic fluxes or magnetic-like fluxes, we find that the nature of interaction is correlated with the existence of potential open string tachyonic instability. When the interaction is attractive, there always exists an open string tachyonic instability when the brane separation reaches the minimum determined by the so-called tachyonic shift. When the interaction is repulsive, there is no such instability for any brane separation. We also find that the enhancement of open string pair production, in the presence of pure electric fluxes, can occur only for the $p - p&#39; = 2$ case.