Researcher profile

Li Zhu

Li Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Deep Research Agents (DRAs) generate citation-rich reports via multi-step search and synthesis, yet existing benchmarks mainly target text-only settings or short-form multimodal QA, missing end-to-end multimodal evidence use. We introduce MMDeepResearch-Bench (MMDR-Bench), a benchmark of 140 expert-crafted tasks across 21 domains, where each task provides an image-text bundle to evaluate multimodal understanding and citation-grounded report generation. Compared to prior setups, MMDR-Bench emphasizes report-style synthesis with explicit evidence use, where models must connect visual artifacts to sourced claims and maintain consistency across narrative, citations, and visual references. We further propose a unified, interpretable evaluation pipeline: Formula-LLM Adaptive Evaluation (FLAE) for report quality, Trustworthy Retrieval-Aligned Citation Evaluation (TRACE) for citation-grounded evidence alignment, and Multimodal Support-Aligned Integrity Check (MOSAIC) for text-visual integrity, each producing fine-grained signals that support error diagnosis beyond a single overall score. Experiments across 25 state-of-the-art models reveal systematic trade-offs between generation quality, citation discipline, and multimodal grounding, highlighting that strong prose alone does not guarantee faithful evidence use and that multimodal integrity remains a key bottleneck for deep research agents.

preprint2026arXiv

Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

Biosignals acquired from different locations on the body often provide temporally ordered views of the same underlying physiological process. However, most existing self supervised learning methods treat these signals as interchangeable views, overlooking the directional temporal dynamics that link them. A canonical example is the relationship between electrocardiography (ECG), which captures the electrical activation initiating each heartbeat, and photoplethysmography (PPG), which records the resulting peripheral pulse delayed by vascular dynamics. To capture this structured relationship, we introduce xMAE, a biosignal pretraining framework that leverages masked cross modal reconstruction across temporally ordered biosignals as a training time constraint to encourage physiologically meaningful timing structure in the learned representations. We show that pretraining with xMAE yields representations that outperform both unimodal and multimodal baselines on 15 of 19 downstream tasks, including cardiovascular outcome prediction, abnormal laboratory test detection, sleep staging, and demographic inference, while generalizing across devices, body locations, and acquisition settings. Further analysis suggests that the ECG PPG timing structure is reflected in the learned PPG representations. More broadly, xMAE demonstrates the effectiveness of incorporating temporal structure into multimodal pretraining when signals observe different stages of a shared underlying process. Code is available at https://github.com/hzhou3/xMAE.

preprint2026arXiv

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models.

preprint2025arXiv

SoK: Web3 RegTech for Cryptocurrency VASP AML/CFT Compliance

The decentralized architecture of Web3 technologies creates fundamental challenges for Anti-Money Laundering and Counter-Financing of Terrorism compliance. Traditional regulatory technology solutions designed for centralized financial systems prove inadequate for blockchain's transparent yet pseudonymous networks. This systematization examines how blockchain-native RegTech solutions leverage distributed ledger properties to enable novel compliance capabilities. We develop three taxonomies organizing the Web3 RegTech domain: a regulatory paradigm evolution framework across ten dimensions, a compliance protocol taxonomy encompassing five verification layers, and a RegTech lifecycle framework spanning preventive, real-time, and investigative phases. Through analysis of 41 operational commercial platforms and 28 academic prototypes selected from systematic literature review (2015-2025), we demonstrate that Web3 RegTech enables transaction graph analysis, real-time risk assessment, cross-chain analytics, and privacy-preserving verification approaches that are difficult to achieve or less commonly deployed in traditional centralized systems. Our analysis reveals critical gaps between academic innovation and industry deployment, alongside persistent challenges in cross-chain tracking, DeFi interaction analysis, privacy protocol monitoring, and scalability. We synthesize architectural best practices and identify research directions addressing these gaps while respecting Web3's core principles of decentralization, transparency, and user sovereignty.

preprint2022arXiv

A filtering technique for the matrix power series being near-sparse

This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by filtering, the proposed algorithm can obtain similar accuracy as the original PS scheme but is more efficient than it. For the near-sparse matrix power series, the proposed method is also more efficient than the MATLAB built-in codes.

preprint2022arXiv

A new stable and avoiding inversion iteration for computing matrix square root

The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering technique is proposed. The high computational efficiency and accuracy of the proposed method are demonstrated by computing the principal square roots of different matrices to reveal its applicability over the existing methods.

preprint2022arXiv

Progressive Glass Segmentation

Glass is very common in the real world. Influenced by the uncertainty about the glass region and the varying complex scenes behind the glass, the existence of glass poses severe challenges to many computer vision tasks, making glass segmentation as an important computer vision task. Glass does not have its own visual appearances but only transmit/reflect the appearances of its surroundings, making it fundamentally different from other common objects. To address such a challenging task, existing methods typically explore and combine useful cues from different levels of features in the deep network. As there exists a characteristic gap between level-different features, i.e., deep layer features embed more high-level semantics and are better at locating the target objects while shallow layer features have larger spatial sizes and keep richer and more detailed low-level information, fusing these features naively thus would lead to a sub-optimal solution. In this paper, we approach the effective features fusion towards accurate glass segmentation in two steps. First, we attempt to bridge the characteristic gap between different levels of features by developing a Discriminability Enhancement (DE) module which enables level-specific features to be a more discriminative representation, alleviating the features incompatibility for fusion. Second, we design a Focus-and-Exploration Based Fusion (FEBF) module to richly excavate useful information in the fusion process by highlighting the common and exploring the difference between level-different features.

preprint2022arXiv

ReGO: Reference-Guided Outpainting for Scenery Image

We aim to tackle the challenging yet practical scenery image outpainting task in this work. Recently, generative adversarial learning has significantly advanced the image outpainting by producing semantic consistent content for the given image. However, the existing methods always suffer from the blurry texture and the artifacts of the generative part, making the overall outpainting results lack authenticity. To overcome the weakness, this work investigates a principle way to synthesize texture-rich results by borrowing pixels from its neighbors (i.e., reference images), named \textbf{Re}ference-\textbf{G}uided \textbf{O}utpainting (ReGO). Particularly, the ReGO designs an Adaptive Content Selection (ACS) module to transfer the pixel of reference images for texture compensating of the target one. To prevent the style of the generated part from being affected by the reference images, a style ranking loss is further proposed to augment the ReGO to synthesize style-consistent results. Extensive experiments on two popular benchmarks, NS6K \cite{yangzx} and NS8K \cite{wang}, well demonstrate the effectiveness of our ReGO. Our code will be made public available.

preprint2021arXiv

Sketch-Guided Scenery Image Outpainting

The outpainting results produced by existing approaches are often too random to meet users' requirement. In this work, we take the image outpainting one step forward by allowing users to harvest personal custom outpainting results using sketches as the guidance. To this end, we propose an encoder-decoder based network to conduct sketch-guided outpainting, where two alignment modules are adopted to impose the generated content to be realistic and consistent with the provided sketches. First, we apply a holistic alignment module to make the synthesized part be similar to the real one from the global view. Second, we reversely produce the sketches from the synthesized part and encourage them be consistent with the ground-truth ones using a sketch alignment module. In this way, the learned generator will be imposed to pay more attention to fine details and be sensitive to the guiding sketches. To our knowledge, this work is the first attempt to explore the challenging yet meaningful conditional scenery image outpainting. We conduct extensive experiments on two collected benchmarks to qualitatively and quantitatively validate the effectiveness of our approach compared with the other state-of-the-art generative models.

preprint2020arXiv

DONet: Dual Objective Networks for Skin Lesion Segmentation

Skin lesion segmentation is a crucial step in the computer-aided diagnosis of dermoscopic images. In the last few years, deep learning based semantic segmentation methods have significantly advanced the skin lesion segmentation results. However, the current performance is still unsatisfactory due to some challenging factors such as large variety of lesion scale and ambiguous difference between lesion region and background. In this paper, we propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation. Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives. Concretely, the two objectives are actually defined by different loss functions. In this way, the two decoders are encouraged to produce differentiated probability maps to match different optimization targets, resulting in complementary predictions accordingly. The complementary information learned by these two objectives are further aggregated together to make the final prediction, by which the uncertainty existing in segmentation maps can be significantly alleviated. Besides, to address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM) to model the complex correlation among skin lesions, where the features with different scale contexts are efficiently integrated to form a more robust representation. Extensive experiments on two popular benchmarks well demonstrate the effectiveness of the proposed DONet. In particular, our DONet achieves 0.881 and 0.931 dice score on ISIC 2018 and $\text{PH}^2$, respectively. Code will be made public available.

preprint2020arXiv

Localization-aware Channel Pruning for Object Detection

Channel pruning is one of the important methods for deep model compression. Most of existing pruning methods mainly focus on classification. Few of them conduct systematic research on object detection. However, object detection is different from classification, which requires not only semantic information but also localization information. In this paper, based on discrimination-aware channel pruning (DCP) which is state-of-the-art pruning method for classification, we propose a localization-aware auxiliary network to find out the channels with key information for classification and regression so that we can conduct channel pruning directly for object detection, which saves lots of time and computing resources. In order to capture the localization information, we first design the auxiliary network with a contextual ROIAlign layer which can obtain precise localization information of the default boxes by pixel alignment and enlarges the receptive fields of the default boxes when pruning shallow layers. Then, we construct a loss function for object detection task which tends to keep the channels that contain the key information for classification and regression. Extensive experiments demonstrate the effectiveness of our method. On MS COCO, we prune 70\% parameters of the SSD based on ResNet-50 with modest accuracy drop, which outperforms the-state-of-art method.

preprint2020arXiv

Prediction of an Extended Ferroelectric Clathrate

Using first-principles calculations, we predict a lightweight room-temperature ferroelectric carbon-boron framework in a host/guest clathrate structure. This ferroelectric clathrate, with composition ScB$_3$C$_3$, exhibits high polarization density and low mass density compared with widely used commercial ferroelectrics. Molecular dynamics simulations show spontaneous polarization with a moderate above-room-temperature T$_c$ of $\sim$370 K, which implies large susceptibility and possibly large electrocaloric and piezoelectric constants at room temperature. Our findings open the possibility for a new class of ferroelectric materials with potential across a broad range of applications.

preprint2019arXiv

Realization of metallic state in 1T-TaS2 with persisting long-range order of charge density wave

Metallization of 1T-TaS2 is generally initiated at the domain boundary of charge density wave (CDW), at the expense of its long-range order. However, we demonstrate in this study that the metallization of 1T-TaS2 can be also realized without breaking the long-range CDW order upon surface alkali doping. By using scanning tunneling microscopy, we find the long-range CDW order is always persisting, and the metallization is instead associated with additional in-gap excitations. Interestingly, the in-gap excitation is near the top of the lower Hubbard band, in contrast to a conventional electron-doped Mott insulator where it is beneath the upper Hubbard band. In combination with the numerical calculations, we suggest that the appearance of the in-gap excitations near the lower Hubbard band is mainly due to the effectively reduced on-site Coulomb energy by the adsorbed alkali ions.